Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 05:22 PM, Sven Barth wrote:



But you do remember that I sent you a list of string types a few days ago?



I just wanted to avoid to state something that might be wrong :-[

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Sven Barth
Am 27.06.2013 13:37 schrieb "Michael Schnell" :
> As I don't have a new Delphi I in turn don't know what exactly
"UnicodeString" means.

But you do remember that I sent you a list of string types a few days ago?

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Jonas Maebe


On 27 Jun 2013, at 03:54, luiz americo pereira camara wrote:


2013/6/21 Sergei Gorelkin :


I've profiled the code and found no conversions taking place. All the
slowdown appears to be caused by other reasons, hard to tell the  
topmost

contributor. What catches the eye is the large amount of calls to
UniqueString, and the fact that SetCodePage goes through implicit
try..finally block even if it does not need to convert the string.


Seems that Florian changed SetCodePage to avoid implicit try finally.

It improved the performance slightly but still a lot slower than  
2.6.X .



The speed is virtually identical for me between FPC 2.6.x and trunk on  
Mac OS X/PPC. Of course, it's an incomplete test program and there is  
no information about how it is compiled. Additionally, the timings in  
the last post in that thread are completely different from the ones in  
the first post, so they seem to come from a different test program or  
a different system (which should be specified, comparing arbitrary  
numbers is useless if you are interested in finding out what the cause  
is).


There was a small difference if the program was either compiled with - 
Fcxxx or contains a {$codepage xxx} directive. That results in the  
constant strings in the program to get that particular code page  
rather than CP_ACP in 2.7.1. In this case the test program became 10%  
slower. The reason was that new ansistrings created in the rtl (e.g.  
for char to ansistring, or concatenating ansistrings) get CP_ACP by  
default, and changing this afterwards to the custom code page caused  
going through InternalSetCodePage() with its exception frame setup/ 
tear down. If solved that in r24985.


Under Linux/i386 I however do still see a significant slowdown  
(regardless of the used code page). Strangely enough, it goes away for  
me if the system unit is compiled with -O2 -Oonostackframe instead of  
with -O2. I don't know why. It might be some ugly cache conflict, but  
for such a small program and little data that is unlikely to be the  
case on modern x86 caches. It might also be code alignment, but some  
playing with -Oaloop and -Oaproc doesn't seem to change it either.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 02:52 PM, Marco van de Voort wrote:

These are all discussion that have raged for years, and an implementation
was made. Basta.


As I can't do any patch for the compiler myself, I can't comment on that.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Marco van de Voort
In our previous episode, Michael Schnell said:

> Yep. But fpc is not windows-centric,

These are all discussion that have raged for years, and an implementation
was made. Basta.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 12:54 PM, Hans-Peter Diettrich wrote:


Now you also should understand that a string variable points directly 
to the string content, it's usable as PChar(str) without any 
conversion. The other information about the string resides *before* 
that address.
I did do the testing program I provided you with appropriately already 
some days ago :-)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Hans-Peter Diettrich

Michael Schnell schrieb:


2) Nothing is copied on an assignment to a string variable, except the 
reference to the memory object.


Sorry,  I erroneously thought about the variable itself being ref 
counted, while in fact the variable is a pointer to the (hidden) String 
management record,


Fine that you finally start familiarizing with reference counted objects 
reality :-)


which is the ref counted entity and holds the content 
pointer to the String array.


Now you also should understand that a string variable points directly to 
the string content, it's usable as PChar(str) without any conversion. 
The other information about the string resides *before* that address.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 01:48 PM, Marco van de Voort wrote:



when storing a - say UTF-8 - String in a
stringlist and retrieving it later to a String variable with encoding
type UTF-8 a dual conversion is done.

Yes.
  

To me this seems absolutely silly.

Correct. Using UTF8 on Windows is silly, as it is not a native string type,
and is never used by default.

Yep. But fpc is not windows-centric, thus i´rt should not force the user 
to n encoding that is suggested by Windows. And (at least the 
definitions in "interface" of) TStringList should be not OS or 
arch-depending. Thus using a String type that imposes a fixed encoding 
or (even worse) that might change according to the Arch/OS setting when 
compiling is a rather bad idea.


As imposing a dual unnecessary conversion or forcing the user to use a 
certain encoding when working with TStringList is a bad idea as well.


This IMHO we do need an appropriately versatile String type and (a 
decently fast) implementation.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> As I don't have a new Delphi I in turn don't know what exactly 
> "UnicodeString" means.

utf16 as has been said hundreds of times, and can be seen in thousands of
locations on the web. If you don't get these essential features, then all
this discussion is useless.
 
>  From what I read I assume this means "System Encoding" and this again 
> means UTF-16 or UCS2.

That's a bad term. Windows has THREE system encodings. One, OEM for legacy
console, one (called ANSI) for 1-byte types, and "UNICODE" which means UTF16
(UCS2 on NT4 and Win2000)

> And if all this is true,  when storing a - say UTF-8 - String in a 
> stringlist and retrieving it later to a String variable with encoding 
> type UTF-8 a dual conversion is done.

Yes.
 
> To me this seems absolutely silly.

Correct. Using UTF8 on Windows is silly, as it is not a native string type,
and is never used by default.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 01:24 PM, Sven Barth wrote:
Delphi uses "String" as type for the "TStringList" and thus with 
Delphi 2009 and newer this is "UnicodeString".




I did assume this.

As I don't have a new Delphi I in turn don't know what exactly 
"UnicodeString" means.


From what I read I assume this means "System Encoding" and this again 
means UTF-16 or UCS2.


And if all this is true,  when storing a - say UTF-8 - String in a 
stringlist and retrieving it later to a String variable with encoding 
type UTF-8 a dual conversion is done.


To me this seems absolutely silly.

It might be acceptable with Delhi that is Windows-centric and in fact 
depreciates the use of codes other than the "System Encoding"


With a cross-platform tool such as fpc, a smarter (though compatible) 
implementation should be provided.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/26/2013 06:19 PM, Hans-Peter Diettrich wrote:


A string variable has no encoding type stored. Only non-empty strings 
have an encoding.


Sorry for bad wording. Not the String variable itself (as same is just a 
pointer to the String Record) but the string Record it points to has the 
field for storing the dynamic encoding type. The string variable of 
course only has it's static encoding type at compile time.


Nonetheless you need to know that a string variable with one (normal) 
encoding type that points to a String Record that holds a different 
encoding type should never happen and might trigger unpredictable behavior.




No string can have an encoding of $.


Why then is same defined as a constant ?

I assume that when creating a RawByteString variable and not assigning a 
normal string to same, it will point to a a string Record holding an  
encoding Type $, but I might be wrong regarding this implementation 
detail.


You might use DXE and test

var s;: RawByteString.
..
setlength(s, 10);   //force allocation if a String record.
...

This only gets interesting when using RawByteString not in the way we 
discussed right now but according to what the name suggests - but this 
is jet another issue.


-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Sven Barth

Am 27.06.2013 13:12, schrieb Michael Schnell:
A prominent example is TStringList. I have no idea how it is 
implemented in DXE, but using "decent" RawByteStrings it can be 
implemented in a way that can be used with all strings without a 
severe  performance hit.
Delphi uses "String" as type for the "TStringList" and thus with Delphi 
2009 and newer this is "UnicodeString".


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/26/2013 10:28 PM, Hans-Peter Diettrich wrote:


Please note that I invited Michael Schnell to provide his version of 
such RTL routines, compatible with *his* ideas about "better" string 
handling.


I would be happy to do this, but unfortunately the modified behavior 
would need to be implemented in the compiler and I don't dare to touch 
that code. The conversion library function would not be affected.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/26/2013 08:01 PM, Sven Barth wrote:


The RTL already uses RawByteString for the concatenation helpers.

Does this code do an assignment of RawByteString to normal String with 
not already matching Types (and thus create erroneous Strings) ?


I would not suppose so.

Otherwise it would be compatible to the suggested modification.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 12:26 PM, Marco van de Voort wrote:


That already has been decided, everything Delphi compatible.

I was just speaking hypothetic case.


The starting point of the discussion was the possibility to improve the 
compiler/library and potentially introduce mode settings that introduce 
a different (better) behavior. I don't suppose there is any 
documentation on fpc yet that exactly describes the behavior when 
assigning a RawByteString to a normal String (in DXE I did not see such 
documentation either)


In general, it has been proven time and time again that deviating from 
Delphi is nearly always the worse choice.
Here the deviation is just doing a decent implementation for a case that 
is depreciated and not decently defined in the Delphi docs, and for 
which a decent behavior can easily be defined and implemented.


I don't think quirks mode would be useful. It is not just syntax, 


Yes, it is "just syntax" as what I suggest to implement in DXE just is a 
depreciated statement (e.g. myString := myRawByteString).


In general the average code doesn't really honour such fine 
differences, so in practice this doesn't matter.


This is very wrong IMHO.

It offers the possibility to create functions that can be fed with any 
(normal)  type of String and act on this without doing a conversion.


A prominent example is TStringList. I have no idea how it is implemented 
in DXE, but using "decent" RawByteStrings it can be implemented in a way 
that can be used with all strings without a severe  performance hit.


Another example is the Lazarus LCL that could provide a user interface 
done with RawByteStrings and thus allow for the user to use any encoding 
that is optimum for his application and LCL internally can use the 
string type that best matches the underling OS. The conversion - if 
necessary - would automatically be done at a decent point in the work. 
The performance hit would be minimal as the compiler only needs to 
implement any additional code (over using the same string type 
everywhere) when a normal String and a RawByteString get together in a 
single statement. Regarding that LCL calls are not done in close loops 
this would be close to zero.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/26/2013 06:29 PM, Hans-Peter Diettrich wrote:


Then you have two choices:
1) convert the string as required
2) copy the content unconverted, but update the encoding


What do you mean by "you have two choices" ?
In fact the compiler designer has the choice to implement some behavior:

 1) convert the string as required
(seems most sensible)
 2) copy the content unconverted, but update the encoding
   (does not seem sensible at all as with that the static encoding type 
of the normal target String does not match the dynamic encoding type any 
more). At other locations in the code the compiler creates will 
implicitly use the static encoding type (e.g. to decide whether or not a 
conversion is necessary) and the content will be interpreted wrong.
 3) issue a warning or (better) an error at compile time for any 
assignment of a RawByteString to a normal String
  (as conversion is not implemented and not converting leads to 
unpredictable behavior)

 4) issue an exception at runtime when the types don't match
  (not nice but consistent)

Of course appropriate "Delphi Quirks" modes could influence the compiler 
on that behalf.




IMO a reasonable decision should take into account the use of the 
RawByteString type in RTL code, e.g. for concatenation.
The RTL of course needs to perfectly match the compiler. But as both are 
"under construction" right now (regarding the behavior with this kind of 
Strings ) I think that is easily doable.


Can you show us your intended code for these functions?


What functions ? We are talking compiler behavior.

I think I already did write down what I meant (the version with just 
RawByteString and not with an additional String Type of another name 
that might be even more "attractive".

I can do this again in a matrix instead of a the text version I wrote;

When assigning "such" Strings (I hope the monospace is visible in the 
List):


(The compiler does the test for encoding using the static (compile time) 
encoding type with normal strings and the dynamic (in the string record) 
encoding type value for RawByteString.)



  Source: |normal String   |  
RawByteString

target:   | |
normal String with the same static encoding   | set pointer |  set 
pointer (after checking dynamic encoding)
normal String with different static encoding  |call conversion |  
call conversion(after checking dynamic encoding)
RawByteString(dynamic type ignored)  |set pointer |  set 
pointer (checking dynamic encoding not necessary)




Note:
 -   if the static types match (be it Raw or not) just set pointer.
 -   the compiler only needs to issue code to  check the dynamic type 
if the source is RawByteString.
 -   the dynamic type of the target is ignored by the compiler. Only 
the conversion function will use it.
 -   the static type of source and target is not used by the conversion 
library function. It can work according to the dynamic types and thus 
just needs to be given the two string variables (Pointers) in the call 
the compiler creates (in assembler object code).
 -   for a normal String, a mismatch between static and dynamic type 
(that would be erroneous in DXE as well) can't happen.
 -   for RawByteString, a "normal" dynamic type means: "this is 
printable information" and a dynamic type $ (that had been assigned 
to the string when instantiating) means: this String just holds just 
bytes with no encoding assumed.


-Michael





___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> > "Should" is a complex thing here, since there is no implementation to 
> > test with (and see if it has other consequences). I assume a 
> > conversion should be inserted, so at least for non rawbytestrings the 
> > runtime encoding always matches the compiletime one. 
> I feel such  implementation details are up to the fpc developers to 
> decide.

That already has been decided, everything Delphi compatible. 

I was just speaking hypothetic case.

> If everybody agrees that doing a conversion when a RawByteString 
> is assigned to a normal String and the dynamic encoding does not match, 
> is the better alternative vs the potentially unpredictable behavior in 
> DXE??, I think this should be the default behavior in fpc.

In general, it has been proven time and time again that deviating from
Delphi is nearly always the worse choice.  It just creates two cases to
check for code that must still compile under Delphi instead of just one.

So even if the extra "case" is better, it still produces more heartbreak for
many people. Unless delphi changes it in some later version (since then,
Delphi codebases will be adapted anyway)
 
> It might be nice to add Delphi quirks modes that issue an error message 
> in that case or just do the assignment even if an "intersexual" String 
> (the static encoding that the compiler sees does not match the dynamic 
> encoding) is the result with unpredictable consequences.

I don't think quirks mode would be useful. It is not just syntax, the
resulting encoding in the string type is different between quirks and
non-quirks mode.  IOW passing strings between an quirks and non-quirks unit
would get very intransparent.

> > The whole concept is about compatibility, and that is a race that has
> > already been run.
> The "incompatibility" only arises when doing something that in Delphi XE 
> is depreciated according to the docs (assigning a RawByteString to a 
> normal String), anyway. Thus I don't see any problem with implemented a 
> sensible behavior in that case.

In general the average code doesn't really honour such fine differences, so
in practice this doesn't matter.
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/26/2013 05:09 PM, Marco van de Voort wrote:
"Should" is a complex thing here, since there is no implementation to 
test with (and see if it has other consequences). I assume a 
conversion should be inserted, so at least for non rawbytestrings the 
runtime encoding always matches the compiletime one. 
I feel such  implementation details are up to the fpc developers to 
decide. If everybody agrees that doing a conversion when a RawByteString 
is assigned to a normal String and the dynamic encoding does not match, 
is the better alternative vs the potentially unpredictable behavior in 
DXE??, I think this should be the default behavior in fpc.


It might be nice to add Delphi quirks modes that issue an error message 
in that case or just do the assignment even if an "intersexual" String 
(the static encoding that the compiler sees does not match the dynamic 
encoding) is the result with unpredictable consequences.




The result could be a strange thing that is encoded
other than the type requires. To me this behavior is a quirk go and
should not be capt just for compatibility. .

The whole concept is about compatibility, and that is a race that has
already been run.
The "incompatibility" only arises when doing something that in Delphi XE 
is depreciated according to the docs (assigning a RawByteString to a 
normal String), anyway. Thus I don't see any problem with implemented a 
sensible behavior in that case.


Another option would be to invent yet another String type that basically 
is a RawByteString but at compile time is used differently just in that 
when assigning it to a normal string that does not match the dynamic 
encoding, the conversion library call is done. (In fact this always was 
my initial idea, in fact: giving the name RawByteString back the 
meaning, the name suggests.) When doing so assigning a RawByteStrinig to 
a normal String could be strictly forbidden (unless some Delphi Quirks 
Mode is set). But I do see that the additional complexity of defining 
jet another String type might be not nice.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell

On 06/27/2013 09:51 AM, Michael Van Canneyt wrote:


There is no content pointer. The string array is appended to the "record"


 I see. Thus the "pointer" is relative and  implicate :-P .  Silly me.

-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Van Canneyt



On Thu, 27 Jun 2013, Michael Schnell wrote:



2) Nothing is copied on an assignment to a string variable, except the 
reference to the memory object.


Sorry,  I erroneously thought about the variable itself being ref counted, 
while in fact the variable is a pointer to the (hidden) String management 
record, which is the ref counted entity and holds the content pointer to the 
String array.


There is no content pointer. 
The string array is appended to the "record"


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-27 Thread Michael Schnell


2) Nothing is copied on an assignment to a string variable, except the 
reference to the memory object.


Sorry,  I erroneously thought about the variable itself being ref 
counted, while in fact the variable is a pointer to the (hidden) String 
management record, which is the ref counted entity and holds the content 
pointer to the String array.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread luiz americo pereira camara
2013/6/21 Sergei Gorelkin :

> I've profiled the code and found no conversions taking place. All the
> slowdown appears to be caused by other reasons, hard to tell the topmost
> contributor. What catches the eye is the large amount of calls to
> UniqueString, and the fact that SetCodePage goes through implicit
> try..finally block even if it does not need to convert the string.
>

Seems that Florian changed SetCodePage to avoid implicit try finally.

It improved the performance slightly but still a lot slower than 2.6.X .

See: 
http://forum.lazarus.freepascal.org/index.php/topic,21223.msg124551.html#msg124551

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich

Sven Barth schrieb:

 > IMO a reasonable decision should take into account the use of the 
RawByteString type in RTL code, e.g. for concatenation.


The RTL already uses RawByteString for the concatenation helpers.


This means that the assumptions implied by that code have to be matched 
by the RawByteString implementation, or the code must be changed when 
the RawByteString implementation is changed.


Please note that I invited Michael Schnell to provide his version of 
such RTL routines, compatible with *his* ideas about "better" string 
handling. Any suggestions for deviating from the Delphi 
implementation/behviour deserve an proof that they are useful in the 
required low-level string manipulation functions. Only after this step 
we can decide for what *other* purposes RawByteString can be used in 
user code.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth
Am 26.06.2013 18:30 schrieb "Hans-Peter Diettrich" :
>
>
>
> Michael Schnell schrieb:
>
>> On 06/26/2013 03:44 PM, Marco van de Voort wrote:
>>>
>>> There never is a conversion when assigning to/from rawbytestring,
>>
>>
>> So what do you suggest should happen when assigning a RawByteString to a
normal String ? The result could be a strange thing that is encoded other
than the type requires. To me this behavior is a quirk go and should not be
capt just for compatibility. .
>
>
> Then you have two choices:
> 1) convert the string as required
> 2) copy the content unconverted, but update the encoding
>
> IMO a reasonable decision should take into account the use of the
RawByteString type in RTL code, e.g. for concatenation.

The RTL already uses RawByteString for the concatenation helpers.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich

Michael Schnell schrieb:


On 06/26/2013 02:59 PM, Sven Barth wrote:

... And using SetCodePage you can force a conversion.


The docs say:

===
 procedure SetCodePage(var S: RawByteString; CodePage: Word; Convert:
Boolean);

Mit der Routine *SetCodePage* setzen Sie die Codeseite für eine 
RawByteString 
-Typvariable.



===


This also is compatible with my suggestion:

If the RawByteString Variable already has a dynamic encoding type
other than $ a conversion might or might not be necessary.


A string variable has no encoding type stored. Only non-empty strings 
have an encoding.


No string can have an encoding of $.

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich



Michael Schnell schrieb:

On 06/26/2013 03:44 PM, Marco van de Voort wrote:

There never is a conversion when assigning to/from rawbytestring,


So what do you suggest should happen when assigning a RawByteString to a 
normal String ? The result could be a strange thing that is encoded 
other than the type requires. To me this behavior is a quirk go and 
should not be capt just for compatibility. .


Then you have two choices:
1) convert the string as required
2) copy the content unconverted, but update the encoding

IMO a reasonable decision should take into account the use of the 
RawByteString type in RTL code, e.g. for concatenation.


Can you show us your intended code for these functions?

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote:
After an assignment both strings refer to the same memory, i.e. 
pchar(s1)=pchar(s2). Everything else indicates an error, somwehere.


This is exactly what I wanted to show: it results in ContentPointer, 
StringLength, ReferenceCount (plus - if no auto-conversion is done - 
supposedly EncodingType and ElementSize in DXE) being identical for both 
strings after the assignment. Thus a RawByteString supposedly will in 
fact get the source's encoding type).


1) AnsiString has no ContentPointer.
2) Nothing is copied on an assignment to a string variable, except the 
reference to the memory object.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich

Sven Barth schrieb:

Am 26.06.2013 14:02, schrieb Michael Schnell:

On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For 
all other string types the content will be converted


That is what I did assume, but I understood dodi in a way that he 
suggested that it (with normal means such as assigning to another 
String) is not possible to make use of the encoding type of a String 
information that had been assigned to a RawByteString.

*sigh*

  +1 ;-)


See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage


The documentation is not complete. Empty strings have no associated 
string record, thus no encoding; StringCodePage always returns the 
CP_ACP for empty strings.


This means that Delphi offers no means to determine the static 
(declared) type of an AnsiString variable (except by RTTI?).


This also requires compiler magic on string assignments, so that the 
static encoding of the target variable can be determined and used to 
force a conversion if required, even if the target is an empty string. 
This magic seems to be buggy, or inconsistent at least, as observed in 
my test programs. When a RawByteString is assigned to an AnsiString 
variable, both variables refer to the same string memory. Afterwards the 
AnsiString can show strange behaviour, as long as it retains a "foreign" 
encoding :-(


From ms-help://embarcadero.rs_xe/rad/String_Types.html
>>
The RawByteString type is type AnsiString($). RawByteString enables 
the passing of string data of any code page without doing any code page 
conversions. RawByteString should only be used as a const or value type 
parameter or a return type from a function. It should never be passed by 
reference (passed by var), and should never be instantiated as a variable.

<<

I'd extend this warning, that a RawByteString never should be assigned 
to an AnsiString variable, because the behaviour of that variable 
becomes almost unpredictable then.

[Unless some newer Delphi version fixes this flaw]


WRT performance, FPC can make use of that undefined behaviour, and 
create the most performant code by not checking and handling 
beforementioned situations. Or FPC can implement more consistent 
behaviour (to be defined).


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> > There never is a conversion when assigning to/from rawbytestring,
> 
> So what do you suggest should happen when assigning a RawByteString to a 
> normal String ?

"Should" is a complex thing here, since there is no implementation  to test
with (and see if it has other consequences).

I assume a conversion should be inserted, so at least for non rawbytestrings
the runtime encoding always matches the compiletime one.

> The result could be a strange thing that is encoded 
> other than the type requires. To me this behavior is a quirk go and 
> should not be capt just for compatibility. .

The whole concept is about compatibility, and that is a race that has
already been run.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> On 06/26/2013 03:44 PM, Marco van de Voort wrote:
> > There never is a conversion when assigning to/from rawbytestring, so 
> > this is a strange line.
> 
> Sven replied to my contribution that suggested an implementation that in 
> fact does a conversion when doing an assignment from a RawByteString to 
> a normal String when appropriate.
> The what _is_ (in DXE) is not discussed here (other than as a subject of 
> comparing).

I was thinking about the below code that returns "1" in Windows and "0" in
Linux. Specially the windows answer is interesting. The Linux result can
probably be explained by non implementation of the windows specific OEM
codepage concept.

{$mode delphiunicode}
const cp_oemcp=1;

type oemstring = type ansistring(cp_OEMCP);

function xx:ansistring;

var nn:rawbytestring;
begin
  setlength(nn,1);
  nn[1]:=#121;
  setcodepage(nn,cp_oemcp);
  result:=nn;
end;

var v : ansistring;

begin
 v:=xx;
 writeln(stringcodepage(v));
end.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 03:44 PM, Marco van de Voort wrote:

There never is a conversion when assigning to/from rawbytestring,


So what do you suggest should happen when assigning a RawByteString to a 
normal String ? The result could be a strange thing that is encoded 
other than the type requires. To me this behavior is a quirk go and 
should not be capt just for compatibility. .


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring, so 
this is a strange line.


Sven replied to my contribution that suggested an implementation that in 
fact does a conversion when doing an assignment from a RawByteString to 
a normal String when appropriate.


The what _is_ (in DXE) is not discussed here (other than as a subject of 
comparing).


 - Michael



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> 
> If the RawByteString Variable already has a dynamic encoding type other
> than $ a conversion might or might not be necessary.

There never is a conversion when assigning to/from rawbytestring, so this is 
a strange line.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 02:59 PM, Sven Barth wrote:

 ... And using SetCodePage you can force a conversion.


The docs say:

===
procedure  SetCodePage(var  S:  RawByteString;  CodePage:  Word;  Convert:  
Boolean);

Mit der Routine *SetCodePage* setzen Sie die Codeseite für eine 
RawByteString 
-Typvariable.


===

This also is compatible with my suggestion:

If the RawByteString Variable already has a dynamic encoding type other than 
$ a conversion might or might not be necessary.

This is the same action as assigning a string from a RawByte String.

Thus the function could use an intermediate string variable and set it's dynamic encoding 
type field to "CodePage", andjust call
the same conversion function as compiler magic does when assigning strings.

The conversion function does not use the static encoding type of it's  
arguments (it even can't know same). So it will do the
appropriate conversion, even if the target in the calling function is a 
RawByteString. (In fact it  will never be called
with a RawByteString target, when the compiler magic creates the call instead 
of manually done by theSetCodePage programming)

I see no problem here either.

-Michael



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 02:59 PM, Sven Barth wrote:


It's a counter argument to "it is not possible to make use of the 
encoding type of a String information that had been assigned to a 
RawByteString". This function returns the current code page of the 
string. And using SetCodePage you can force a conversion.




I don't see the problem.

(1)
fpc does not need to closely follow Delphi with stuff that is only 
seldom used by average application programmers, if there are decent 
reasons to do another (better) decently compatible implementation.


(2)
According to the description, StringCodePage returns "the Codepage". 
According to your wording it returns "the current code page". With 
normal strings the _current_ (aka dynamic) code page always is identical 
to the code page the string had been given (by the compiler) when 
instantiated. (Otherwise the string would be "intersexuel" or erroneous 
and will behave erratically). That is why the (floppy) wording of the 
description omits this difference.


As you stated before, the RawByteString _does_ preserve the encoding 
type of the information that is assigned to it. It can only do this 
using its dynamic "EncodingType" field. Thus it makes sense that the 
function returns the dynamic EncodingType with RawByteStrings.


Thus it simply always might return the dynamic EncodingType.

And this is exactly the information that (IMHO) should be used when 
auto-converting,  with the only exception when assigning _to_ a 
RawByteString (_static_ encoding Type $). That easily can be decided 
by the compiler at compile time (that here and in many other cases does 
not even need to call the library, as assigning is simply done by 
setting the pointer and increasing the RefCount), which IMHO should be 
done "inline".


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth

Am 26.06.2013 14:46, schrieb Michael Schnell:

On 06/26/2013 02:08 PM, Sven Barth wrote:

Am 26.06.2013 14:02, schrieb Michael Schnell:
That is what I did assume, but I understood dodi in a way that he 
suggested that it (with normal means such as assigning to another 
String) is not possible to make use of the encoding type of a String 
information that had been assigned to a RawByteString.

*sigh*
See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage

Sorry I don't see what this (very floppy) worded page (that I of 
course did know) should say me about the stuff in question to me:
 - static (known to the compiler) vs dynamic (stored with the string) 
encoding type
 - how the compiler and library handles RawByteString as source and/or 
target of an assignment.


It's a counter argument to "it is not possible to make use of the 
encoding type of a String information that had been assigned to a 
RawByteString". This function returns the current code page of the 
string. And using SetCodePage you can force a conversion.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 02:08 PM, Sven Barth wrote:

Am 26.06.2013 14:02, schrieb Michael Schnell:
That is what I did assume, but I understood dodi in a way that he 
suggested that it (with normal means such as assigning to another 
String) is not possible to make use of the encoding type of a String 
information that had been assigned to a RawByteString.

*sigh*
See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage

Sorry I don't see what this (very floppy) worded page (that I of course 
did know) should say me about the stuff in question to me:
 - static (known to the compiler) vs dynamic (stored with the string) 
encoding type
 - how the compiler and library handles RawByteString as source and/or 
target of an assignment.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth

Am 26.06.2013 14:02, schrieb Michael Schnell:

On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For 
all other string types the content will be converted


That is what I did assume, but I understood dodi in a way that he 
suggested that it (with normal means such as assigning to another 
String) is not possible to make use of the encoding type of a String 
information that had been assigned to a RawByteString.

*sigh*
See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For all 
other string types the content will be converted


That is what I did assume, but I understood dodi in a way that he 
suggested that it (with normal means such as assigning to another 
String) is not possible to make use of the encoding type of a String 
information that had been assigned to a RawByteString.


Thanks for the affirmation
-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth

Am 26.06.2013 13:59, schrieb Michael Schnell:

BTW.

I think the implementation would be quite easy, straight forward, fast 
and compatible.


 - The compiler knows the static encoding type of each string variable.
 - The dynamic encoding type of a String is preset to the static 
encoding type when the string is allocated
 - only RawByteStrings (EncodingType $) are allowed to change 
their dynamic encoding type, with other Strings this will lead to 
unpredictable results



When Strings are assigned:
 - If the static encoding type of source and target is identical (be 
it normal or RAW) (already checked by the compiler) -> the same 
happens as with the pre-Unicode compiler (setting the pointer to the 
StringRecord and managing the RefCount)

otherwise:
 - If the target is statically defined as RawByteString (already 
checked by the compiler) -> the same happens
 - If the source is statically defined as RawByteString (already 
checked by the compiler), code is implemented that checks if the 
dynamic encoding of the source is identical to the (known to the 
compiler) static encoding type of the target -> the same happens


otherwise the conversion library is called. Same checks the _dynamic_ 
encoding type of source and target (thus it only needs to be provided 
with the Strings themselves and no additional information generated by 
the compiler) and does the conversion appropriately.



When doing operation on two Strings (such as "+" and compare), one of 
the operators is (virtually) copied to a String with the same encoding 
type as the other.


Here:
 - if one operand is a RawByteString use the (static or dynamic) 
encoding of the other.
 - if both are RawByteStrings use the dynamic encoding use the dynamic 
encoding of one of them (supposedly this is no alternate case to before)


If the conversion library sees a dynamic encoding type of $ for 
either source or target it will fail and issue an exception.


See my previously sent answer...

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

BTW.

I think the implementation would be quite easy, straight forward, fast 
and compatible.


 - The compiler knows the static encoding type of each string variable.
 - The dynamic encoding type of a String is preset to the static 
encoding type when the string is allocated
 - only RawByteStrings (EncodingType $) are allowed to change their 
dynamic encoding type, with other Strings this will lead to 
unpredictable results



When Strings are assigned:
 - If the static encoding type of source and target is identical (be it 
normal or RAW) (already checked by the compiler) -> the same happens as 
with the pre-Unicode compiler (setting the pointer to the StringRecord 
and managing the RefCount)

otherwise:
 - If the target is statically defined as RawByteString (already 
checked by the compiler) -> the same happens
 - If the source is statically defined as RawByteString (already 
checked by the compiler), code is implemented that checks if the dynamic 
encoding of the source is identical to the (known to the compiler) 
static encoding type of the target -> the same happens


otherwise the conversion library is called. Same checks the _dynamic_ 
encoding type of source and target (thus it only needs to be provided 
with the Strings themselves and no additional information generated by 
the compiler) and does the conversion appropriately.



When doing operation on two Strings (such as "+" and compare), one of 
the operators is (virtually) copied to a String with the same encoding 
type as the other.


Here:
 - if one operand is a RawByteString use the (static or dynamic) 
encoding of the other.
 - if both are RawByteStrings use the dynamic encoding use the dynamic 
encoding of one of them (supposedly this is no alternate case to before)


If the conversion library sees a dynamic encoding type of $ for 
either source or target it will fail and issue an exception.



IMHO it makes a much more sense to implement things like TStringList on 
base of RawByteString, as when doing it based on the default System 
encoding, there will be a dual conversion when using it with any other 
encoding type.


IMHO big commonly used, arch independent, non super high-performance 
libraries (like LCL) should use RawByteString as their user interface 
and internally as widely as possible, so that conversions are prevented 
whenever possible (e.g. when the user's call provides a string and 
during the work in the library it is decided that it is not actually used.)


-Michael (the weird one)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth

Am 26.06.2013 12:38, schrieb Michael Schnell:

On 06/26/2013 12:13 PM, Sven Barth wrote:
You do know that s2 will point to the same record of s1 after the 
assignment? The contents of the string record are not copied, only 
the pointer of s2 will change. See this example:




You are right (my testing program in pre-Unicode-Delphi does show 
exactly this).


But what I wanted to show is, that here more is done but just managing 
the reference count. Regarding the dispute I had with dodi, he thus is 
right that the length is not exactly _copied_over_, but the pointer is 
managed in a way that the same length (and content) is shown. (I admit 
that he is correct calling this just "reference counting".)


Regarding the underlying discussion about RawByteString:
If exactly this is done when assigning a normal String variable to a 
RawByteString Variable, it happens exactly what I suppose (and dodi 
seems to deny): the dynamic encoding type of the RawByteString 
(target) will be set to the encoding type of the normal String 
(source). Thus the encoding type is _not_ lost and (in principle) when 
assigning a RawByteString to a normal String, the library would be 
able to check the actual dynamic encoding type of the source against 
the (static=dynamic) encoding type of the target and do a conversion 
if appropriate. IMHO this would be a very sensible behavior.
It's the whole use of RawByteString that the encoding is kept. For all 
other string types the content will be converted. See also 
compare_defs_ext in compiler/defcmp.pas around line 445 (look for "don't 
convert ansistrings").


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote:
After an assignment both strings refer to the same memory, i.e. 
pchar(s1)=pchar(s2). Everything else indicates an error, somwehere.


This is exactly what I wanted to show: it results in ContentPointer, 
StringLength, ReferenceCount (plus - if no auto-conversion is done - 
supposedly EncodingType and ElementSize in DXE) being identical for both 
strings after the assignment. Thus a RawByteString supposedly will in 
fact get the source's encoding type).


(see my mail to Sven).

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 12:13 PM, Sven Barth wrote:
You do know that s2 will point to the same record of s1 after the 
assignment? The contents of the string record are not copied, only the 
pointer of s2 will change. See this example:




You are right (my testing program in pre-Unicode-Delphi does show 
exactly this).


But what I wanted to show is, that here more is done but just managing 
the reference count. Regarding the dispute I had with dodi, he thus is 
right that the length is not exactly _copied_over_, but the pointer is 
managed in a way that the same length (and content) is shown. (I admit 
that he is correct calling this just "reference counting".)


Regarding the underlying discussion about RawByteString:
If exactly this is done when assigning a normal String variable to a 
RawByteString Variable, it happens exactly what I suppose (and dodi 
seems to deny): the dynamic encoding type of the RawByteString (target) 
will be set to the encoding type of the normal String (source). Thus the 
encoding type is _not_ lost and (in principle) when assigning a 
RawByteString to a normal String, the library would be able to check the 
actual dynamic encoding type of the source against the (static=dynamic) 
encoding type of the target and do a conversion if appropriate. IMHO 
this would be a very sensible behavior.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote:

Michael Schnell schrieb:
Supposedly the length and encoding number and code-bytecount is 
copied, too.


Please understand reference counted memory objects :-]

Please check this program I tested with a pre-Unicode Delphi.

It shows that (of course) the string length gets copied when assigning a 
string variable to another and how it is done.


I don't see how this is checked by your code.

After an assignment both strings refer to the same memory, i.e. 
pchar(s1)=pchar(s2). Everything else indicates an error, somwehere.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Sven Barth

Am 26.06.2013 09:41, schrieb Michael Schnell:

On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote:

Michael Schnell schrieb:
Supposedly the length and encoding number and code-bytecount is 
copied, too.


Please understand reference counted memory objects :-]

Please check this program I tested with a pre-Unicode Delphi.

It shows that (of course) the string length gets copied when assigning 
a string variable to another and how it is done.


You do know that s2 will point to the same record of s1 after the 
assignment? The contents of the string record are not copied, only the 
pointer of s2 will change. See this example:


=== code begin ===

program tstrassign;

{$apptype console}
{$ifdef fpc}
  {$H+}
{$endif}

{$ifndef fpc}
uses
  SysUtils;

function hexstr(ptr: Pointer): String;
begin
  Result := IntToHex(Integer(ptr), 8);
end;
{$endif}

var
  s1, s2: String;
begin
  s1 := 'Test';
  Writeln(hexstr(Pointer(s1)), ' ', hexstr(Pointer(s2)));
  s2 := s1;
  Writeln(hexstr(Pointer(s1)), ' ', hexstr(Pointer(s2)));
end.

=== code end ===

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/26/2013 09:41 AM, Michael Schnell wrote:


It shows ... how it is done.



Hi DoDi,

You might be inclined to enhance the test program for me and compile it 
with DXE:


AFAI understand the encoding type and as I see in 
http://wiki.freepascal.org/FPC_Unicode_support :



type
  TRefStringRec=  packed  record
Encoding:  word; // encoding of string
ElementSize:  byte;  // size in bytes of string's element (1-4)
Ref:  SizeInt;   // number of references
Len:  SizeInt;   // number of elements is string
  end;


(In fact I suppose that a dummy byte is inserted to prevent that the SizeInt 
types are misaligned)

The encoding type information should be just before the ref counter and thus 
adding something like


 v1 := PInteger(j1-12);
 v2 := PInteger(j2-12);

And printing this in hex should show this information.

Now you could test in the newest DXE version what happens when assigning a 
normal string to a RawByteString and vice versa.

Thanks for helping out...
-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote:


This is not the case :-(

A variable can not force a conversion, when a RawByteString is 
assigned to it :-(
I suppose you decently tested this with the newest version of Delphi XE. 
I can't comment, as I dont have DXE. :-( .


But you and the docs state, that RawByteString is not intended to hold a 
string of unencoded raw bytes that never are supposed to represent 
printable characters, but despite of its name it is (e.g.) supposed to 
be used as formal parameter, a String holding printable characters with 
a known encoding is to be assigned to. Thus I in fact fail to see the 
sense of it existence, if really the information about the encoding type 
of the string, assigned (without conversion) to it, is lost.


OTOH it seems to be easily to understand and to implement that/if the 
dynamic EncodingTyp tag  (that according top the docs exists in the 
string management record together with the ContentPointer, the 
StringLength and ReferenceCounter) is updated with that information 
during the assignment (in the same way as ContentPointer and 
StringLength). This would allow for decent use of such a type variant 
and IMHO should be the way to go for fpc. This would be perfectly 
compatible, even if Delphi does not allow for such usage of the 
RawByteString Type. It would not slow down anything if you don't use 
that feature, and IMHO the performance hit would be close to zero (and 
still rather compatible) if implementing stuff like TStringList using 
this feature.


Effect:
 - Such a TStringList would be able to work with any String type 
without ever forcing an auto-conversion (unless you check out a string 
to a variable of a different (static) type).
 - Lazarus could use such a string type as it's interface to the user 
code. This would allow for using greatly the same code for multiple 
archs, independent of the user code. IMHO, the performance hit for this 
should be small, as these interface functions mostly are not used in 
very long close loops.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/25/2013 01:25 PM, Hans-Peter Diettrich wrote:


8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 
and the currently released Lazarus this seems to be 8 Bits.


Please read before confusing everything.
Sorry that I maybe did not phrase my question/ request appropriately: I 
am interested in clear terms for use in a discussion. To demonstrate 
that I just wanted to show that the same language keywords are used with 
different meanings in different versions of  the compilers / libraries.



Your recent messages still indicate that you never understood even 
string basics. Why don't you start adjusting your weird mind to the 
facts, as have been given repeatedly since years? :-(
Sorry again. We are (still) discussing how am implementation in fpc can 
potentially be better than that in DXE. Thus the "basics" and the 
"facts" are not really of interest.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-26 Thread Michael Schnell

On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote:

Michael Schnell schrieb:
Supposedly the length and encoding number and code-bytecount is 
copied, too.


Please understand reference counted memory objects :-]

Please check this program I tested with a pre-Unicode Delphi.

It shows that (of course) the string length gets copied when assigning a 
string variable to another and how it is done.


-Michael




===0
var
 s1, s2: String;
 i1, i2: Integer;
 j1, j2: Integer;
 l1, l2: Integer;
 p1, p2: PChar;
 x1, x2: PInteger;
 y1, y2: PInteger;
 z1, z2: PInteger;
procedure TForm12.FormCreate(Sender: TObject);
begin

 s1 := 'ABCDEFG';


 l1 := Length(s1);
 l2 := Length(s2);
 p1 := pchar(s1);
 p2 := pchar(s2);
 i1 := Integer(@s1);
 i2 := Integer(@s2);
 j1 := Integer(p1);
 j2 := Integer(p2);
 x1 := PInteger(j1);
 x2 := PInteger(j2);
 y1 := PInteger(j1-4);
 y2 := PInteger(j2-4);
 z1 := PInteger(j1-8);
 z2 := PInteger(j2-8);
 Memo1.Lines.Add('l : ' + IntToStr(l1) + ' ' + IntToStr(l2));
 Memo1.Lines.Add('i : ' + IntToHex(i1,  8) + ' ' + IntToHex(i2,  8) + ' 
(@s)');
 Memo1.Lines.Add('j : ' + IntToHex(j1,  8) + ' ' + IntToHex(j2,  8) + ' 
(s= String Record)');
 Memo1.Lines.Add('x^: ' + IntToHex(x1^, 8) + ' ' + IntToHex(x2^, 8) + ' 
(^s-0 = Content)');
 Memo1.Lines.Add('y^: ' + IntToHex(y1^, 8) + ' ' + IntToHex(y2^, 8) + ' 
(^s-4 = Length)');
 Memo1.Lines.Add('z^: ' + IntToHex(z1^, 8) + ' ' + IntToHex(z2^, 8) + ' 
(^s-8)= RefCount');


 s2 := s1;
 Memo1.Lines.Add('');
 l1 := Length(s1);
 l2 := Length(s2);
 p1 := pchar(s1);
 p2 := pchar(s2);
 i1 := Integer(@s1);
 i2 := Integer(@s2);
 j1 := Integer(p1);
 j2 := Integer(p2);
 x1 := PInteger(j1);
 x2 := PInteger(j2);
 y1 := PInteger(j1-4);
 y2 := PInteger(j2-4);
 z1 := PInteger(j1-8);
 z2 := PInteger(j2-8);
 Memo1.Lines.Add('l : ' + IntToStr(l1) + ' ' + IntToStr(l2));
 Memo1.Lines.Add('i : ' + IntToHex(i1,  8) + ' ' + IntToHex(i2,  8) + ' 
(@s)');
 Memo1.Lines.Add('j : ' + IntToHex(j1,  8) + ' ' + IntToHex(j2,  8) + ' 
(s= String Record)');
 Memo1.Lines.Add('x^: ' + IntToHex(x1^, 8) + ' ' + IntToHex(x2^, 8) + ' 
(^s-0 = Content)');
 Memo1.Lines.Add('y^: ' + IntToHex(y1^, 8) + ' ' + IntToHex(y2^, 8) + ' 
(^s-4 = Length)');
 Memo1.Lines.Add('z^: ' + IntToHex(z1^, 8) + ' ' + IntToHex(z2^, 8) + ' 
(^s-8)= RefCount');

end;


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Michael Schnell

On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote:
 Efficient code must be based on a single encoding, with conversions 
only from and to the outer world (OS, files...).




That does not force to prevent intermediately storing a string in 
something that can hold any encoding type.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/24/2013 08:21 PM, Sven Barth wrote:



AnsiString:
  up to 2^23-1 characters, reference counted, system encoding 
(determined by the code page at compilation time AFAIK)


8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 
and the currently released Lazarus this seems to be 8 Bits.


Please read before confusing everything.

In fact I did ask for a way to distinguish all this verbally (not the 
keywords in a source file) to allow for doing a non ambiguous 
discussion. This needs Names that denote the version of the library used.


Your recent messages still indicate that you never understood even 
string basics. Why don't you start adjusting your weird mind to the 
facts, as have been given repeatedly since years? :-(


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:



A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is 
used to convert the string.


That sounds good. The name "RAW" just misled me to think it would not 
hold a character encoding.


If this in fact is a completely dynamically  encoded string type, things 
like TStringList should use same in their interface, thus preventing all 
conversions, when a string of any encoding type is stored there and 
retrieved to a variable of the appropriate dedicated encoding type 
(while being auto-converted if retrieved to variable forcing a different 
encoding).


This is not the case :-(

A variable can not force a conversion, when a RawByteString is assigned 
to it :-(


Only the documentation 
http://docwiki.embarcadero.com/Libraries/XE4/en/System.RawByteString 
shows that they seemingly are not convinced that all this decently works 
:-( .


At least it doesn't work as you expected.

So a decent system should _additionally_ provide completely unencoded 8, 
16, 32 and 64 Bit entity Strings for "technical" usage (similar to pipes 
etc) (now not using the "RAW" naming :-) .


It's *only* the use of strings of different encodings, that make 
conversions necessary. Efficient code must be based on a single 
encoding, with conversions only from and to the outer world (OS, files...).


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
In fact it looks like only the string pointers are copied between 
AnsiString and RawByteString, with the refcount changed accordingly.
Supposedly the length and encoding number and code-bytecount is copied, 
too.


Please understand reference counted memory objects :-]

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Marco van de Voort
In our previous episode, Sven Barth said:
> AnsiString:
>up to 2^23-1 characters, reference counted, system encoding 
> (determined by the code page at compilation time AFAIK)

(2^31-1 obviously, since it is 32-bit variable, but many operations
use signed types)
 
> WideString
>- on non-Windows: same as UnicodeString
>- on Windows: up to 2^23-1 characters (?), non reference counted (but 
> managed by OS), UTF-16 encoding

(before Win2000, UCS2)
 
> String:
>- in all modes besides mode delphiunicode or modeswitch 
> unicodestrings with H-: ShortString

>- in all modes besides mode delphiunicode or modeswitch 
> unicodestrings with H+: AnsiString

>- in mode delphiunicode or modeswitch unicodestrings with H+: 
> UnicodeString
>(- don't know whether this is correct: in mode delphiunicode or 
> modeswitch unicodestrings with H-: ShortString)

{$mode delphunicode}{$H-} results in shortstring yes (checked by sizeof)

Note that {$mode delphi} and {$mode delphiunicode} also enable {$H+} while
e.g. mode objfpc doesn't.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Michael Schnell

On 06/24/2013 08:21 PM, Sven Barth wrote:



AnsiString:
  up to 2^23-1 characters, reference counted, system encoding 
(determined by the code page at compilation time AFAIK)


8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 
and the currently released Lazarus this seems to be 8 Bits.


In fact I did ask for a way to distinguish all this verbally (not the 
keywords in a source file) to allow for doing a non ambiguous 
discussion. This needs Names that denote the version of the library used.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Michael Schnell

On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
In fact it looks like only the string pointers are copied between 
AnsiString and RawByteString, with the refcount changed accordingly.
Supposedly the length and encoding number and code-bytecount is copied, 
too.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-25 Thread Michael Schnell

On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:



A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is 
used to convert the string.


That sounds good. The name "RAW" just misled me to think it would not 
hold a character encoding.


If this in fact is a completely dynamically  encoded string type, things 
like TStringList should use same in their interface, thus preventing all 
conversions, when a string of any encoding type is stored there and 
retrieved to a variable of the appropriate dedicated encoding type 
(while being auto-converted if retrieved to variable forcing a different 
encoding).


Only the documentation 
http://docwiki.embarcadero.com/Libraries/XE4/en/System.RawByteString 
shows that they seemingly are not convinced that all this decently works 
:-( .


So a decent system should _additionally_ provide completely unencoded 8, 
16, 32 and 64 Bit entity Strings for "technical" usage (similar to pipes 
etc) (now not using the "RAW" naming :-) .


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Hans-Peter Diettrich

Sven Barth schrieb:

On 24.06.2013 16:44, Hans-Peter Diettrich wrote:

I hope, now I understand that the type RawByteString ( = String
($) ) means "codesize = 1 Byte, never to be auto-converted to any
differently encoded String type variable.


No. Even if I would like such an encoding, too, Delphi doesn't implement
it.


But he is right. RawByteString is defined in unit system as 
AnsiString(CP_NONE) where CP_NONE is defined as $. This means that 
no conversions to or from a variable of this type are done (or any other 
AnsiString type that has code page $)


Well, after some tests it looks more complicated to me.

A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is 
used to convert the string.


In fact it looks like only the string pointers are copied between 
AnsiString and RawByteString, with the refcount changed accordingly. 
This can lead to strange results (in XE). As soon as an AnsiString has 
obtained a different encoding, no further conversions seem to occur. 
Once I copy an OEMString (cp 437) into an RawByteString, and from there 
into an AnsiString, the AnsiString has obtained OEM encoding. Adding 
further strings to it, of different codepages, only results in a 
concatenation of the strings, without any conversions, the encoding is 
still reported as OEM. This means that the encoding of an AnsiString is 
not guaranteed to be the defined one, not even a unique one!


Can somebody test this with a newer Delphi version?

Resetting such an ill-behaved AnsiString seems to require a direct 
assignment of another AnsiString variable, whereupon the AnsiString will 
return to its *defined* encoding and resume eventually required 
conversions to that encoding.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Sven Barth

On 24.06.2013 16:44, Hans-Peter Diettrich wrote:

I hope, now I understand that the type RawByteString ( = String
($) ) means "codesize = 1 Byte, never to be auto-converted to any
differently encoded String type variable.


No. Even if I would like such an encoding, too, Delphi doesn't implement
it.


But he is right. RawByteString is defined in unit system as 
AnsiString(CP_NONE) where CP_NONE is defined as $. This means that 
no conversions to or from a variable of this type are done (or any other 
AnsiString type that has code page $)


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Sven Barth

On 24.06.2013 11:36, Michael Schnell wrote:

On 06/21/2013 07:43 PM, Sven Barth wrote:



Just to clear up the names: UnicodeString is *not* the code page aware
string type (although they share the "metadata" record). It is a
dynamic length 2 byte string. The code page aware string type is
AnsiString.



Thanks for making  this clear.

Could you give us a list of the different  - legacy and to be supported
- string types we might be seeing including their "official" names to
make the discussion less ambiguous.


ShortString:
  255 character, non reference counted, system encoding

String[X]:
  same as ShortString with maximum of X characters

AnsiString:
  up to 2^23-1 characters, reference counted, system encoding 
(determined by the code page at compilation time AFAIK)


AnsiString(X):
  same as AnsiString, but with the specified code page (UTF-16 code 
pages are not allowed)


RawByteString:
  basically AnsiString($) (AFAIK); no code page conversions are 
done when a another AnsiString is assigned (UnicodeString is converted 
to currently active code page) and the other way round


UnicodeString:
  up to 2^23-1 characters, reference counted, UTF-16 encoding

WideString
  - on non-Windows: same as UnicodeString
  - on Windows: up to 2^23-1 characters (?), non reference counted (but 
managed by OS), UTF-16 encoding


String:
  - in all modes besides mode delphiunicode or modeswitch 
unicodestrings with H-: ShortString
  - in all modes besides mode delphiunicode or modeswitch 
unicodestrings with H+: AnsiString
  - in mode delphiunicode or modeswitch unicodestrings with H+: 
UnicodeString
  (- don't know whether this is correct: in mode delphiunicode or 
modeswitch unicodestrings with H-: ShortString)


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Thaddy

On 24-6-2013 17:13, Michael Schnell wrote:

On 06/24/2013 04:44 PM, Hans-Peter Diettrich wrote:


Not in Delphi. For binary data TBytes has been added.
Which (AFAIK) is not reference counting can't do "+" and thus much 
less versatile.


It is also highly controversial since XE4:

For example a good breakdown in 
http://blog.synopse.info/post/2013/05/11/Delphi-XE4-NextGen-compiler-is-disapointing


This is by no means the only complaint about the latest "string" 
whatever it is supposed to be. ;)


Thaddy
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Michael Schnell

On 06/24/2013 04:44 PM, Hans-Peter Diettrich wrote:


Not in Delphi. For binary data TBytes has been added.
Which (AFAIK) is not reference counting can't do "+" and thus much less 
versatile.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote:
 I e.g. remember your strange (Delphi incompatible) opinions about 
RawByteString and encodings in a startup discussion.


Yep. As I did not have DTX to try it, I only read what I could find in 
the internet and supposedly got it wrong.


Yes, still wrong despite earlier explanations :-(

I hope, now I understand that the type RawByteString ( = String ($) 
) means "codesize = 1 Byte, never to be auto-converted to any 
differently encoded String type variable.


No. Even if I would like such an encoding, too, Delphi doesn't implement it.

I seem to understand that DXE does not provide a fully dynamic string 
type (e.g. to be used as a function parameter taking any String(x) type 
without auto-conversion. I still do hope that fpc will provide this one 
day.


This is what RawByteString is for. A RawByteString can have *any* 
encoding, it's kind of a generic AnsiString. Other AnsiStrings have a 
*fixed* encoding, that determines eventually required conversions.



Moreover I do hope for RawWordString, RawDwordString and RawQWordeString.


Not in Delphi. For binary data TBytes has been added.

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Michael Schnell

On 06/24/2013 03:11 PM, Marco van de Voort wrote:

You can funnily store utf8 in type ansistring under Delphi 7 too.
Yep. But D7 does not rely on some string to be encoded in UTF8 (but in 
the ANSI table the System configuration defines), while the LCL API 
wants to see the strings in UTF8 code. _This_ is funny IMHO.


-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> 
> I do now that that the Delphi 7 compatible String in fpc sometimes has 
> been called ANSIString, while Lazarus funnily stores UTF8 in the type 
> ANSIString, even in spite of the naming.

You can funnily store utf8 in type ansistring under Delphi 7 too.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Michael Schnell

On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote:
 I e.g. remember your strange (Delphi incompatible) opinions about 
RawByteString and encodings in a startup discussion.


Yep. As I did not have DTX to try it, I only read what I could find in 
the internet and supposedly got it wrong.


I hope, now I understand that the type RawByteString ( = String ($) 
) means "codesize = 1 Byte, never to be auto-converted to any 
differently encoded String type variable.


I seem to understand that DXE does not provide a fully dynamic string 
type (e.g. to be used as a function parameter taking any String(x) type 
without auto-conversion. I still do hope that fpc will provide this one 
day.


Moreover I do hope for RawWordString, RawDwordString and RawQWordeString.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Michael Schnell

On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote:


This should be clear since a long time.


Sorry, but e.g. I don't know the "official" names of the Delphi 7 
compatible "String" and the Delphi XE compatible "String" in fpc/Lazarus.


I suppose in DXE the Delphi 7 compatible String is not available at all, 
while I suppose in fpc this String type will still be available when 
setting appropriate compiler options.


I do now that that the Delphi 7 compatible String in fpc sometimes has 
been called ANSIString, while Lazarus funnily stores UTF8 in the type 
ANSIString, even in spite of the naming.


I seem to have read that in Delphi XE the strings also are called 
ANSIString, even if they work differently from what (the currently 
released) fpc call with that name.


So a decent grid would be very helpful.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

Could you give us a list of the different  - legacy and to be supported 
- string types we might be seeing including their "official" names to 
make the discussion less ambiguous.


This should be clear since a long time. I e.g. remember your strange 
(Delphi incompatible) opinions about RawByteString and encodings in a 
startup discussion.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-24 Thread Michael Schnell

On 06/21/2013 07:43 PM, Sven Barth wrote:



Just to clear up the names: UnicodeString is *not* the code page aware 
string type (although they share the "metadata" record). It is a 
dynamic length 2 byte string. The code page aware string type is 
AnsiString.




Thanks for making  this clear.

Could you give us a list of the different  - legacy and to be supported 
- string types we might be seeing including their "official" names to 
make the discussion less ambiguous.


Thanks,
-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-22 Thread Florian Klämpfl
Am 21.06.2013 16:29, schrieb Sergei Gorelkin:
> and the fact that SetCodePage goes through implicit
> try..finally block even if it does not need to convert the string.

I've fixed this one on r24942

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Sven Barth
Am 21.06.2013 10:36 schrieb "Michael Schnell" :
>
> On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:
>>
>>
>> Again I'd assume that the memory allocation for the result is the most
expensive operation with UnicodeString operands, independent from string
lengths.
>>
>
> Do you suggest that with UnicodeString - even when using 1 Byte encoding
types such as  ANSIxxx or UTF-8 -,   the memory allocation is more
expensive than with the older String handling implementation ?

Just to clear up the names: UnicodeString is *not* the code page aware
string type (although they share the "metadata" record). It is a dynamic
length 2 byte string. The code page aware string type is AnsiString.

Regards
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote:


Please note that I was *not* talking about AnsiStrings.


Sorry I don't understand.


You snipped the context, which was UnicodeString (second case). The 
AnsiString case was covered before.


I recon the OP asking about a performance hit, meant a degradation 
regarding the "new" (Delphi XE compatible") vs  the "old" (Delphi 7 
compatible) String library.


Right, and there seem to be more issues with the current implementation. 
E.g. I don't understand the many tests in the RawByteString 
concatenation, and others found excess try-finally blocks and 
UniqueString calls.


Another reason may be the (old?) TStringList in the test program, 
possibly using AnsiStrings, which will cause overhead when used with 
UnicodeStrings. I didn't do own researches yet, my statements are based 
on general considerations and a perfect implementation.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Michael Schnell

On 06/21/2013 04:29 PM, Sergei Gorelkin wrote:
 What catches the eye is the large amount of calls to UniqueString, 


It would be interesting to see whether the old (not "new Unicode 
library") project does the same amount of UniqueString. I don't see why 
the new library should do more of these calls or why they should be 
slower (while using the same encoding.)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Sergei Gorelkin

On 21.06.2013 17:11, luiz americo pereira camara wrote:

2013/6/21 Michael Schnell :

On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:


Maybe in that example there's going an (unneeded) conversion?


If you use the same string type all over the place it would be a severe bug
if _any_ conversion is done.

Please check.



The affected code can be seen here
http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html .

I don't have a 2.7.1 setup so i cant debug myself. I'm just reporting
what the user found

I've profiled the code and found no conversions taking place. All the slowdown appears to be caused 
by other reasons, hard to tell the topmost contributor. What catches the eye is the large amount of 
calls to UniqueString, and the fact that SetCodePage goes through implicit try..finally block even 
if it does not need to convert the string.


Regards,
Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Michael Schnell

On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote:


Please note that I was *not* talking about AnsiStrings.


Sorry I don't understand.

I recon the OP asking about a performance hit, meant a degradation 
regarding the "new" (Delphi XE compatible") vs  the "old" (Delphi 7 
compatible) String library.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread luiz americo pereira camara
2013/6/21 Michael Schnell :
> On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
>>
>> Maybe in that example there's going an (unneeded) conversion?
>
> If you use the same string type all over the place it would be a severe bug
> if _any_ conversion is done.
>
> Please check.
>

The affected code can be seen here
http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html .

I don't have a 2.7.1 setup so i cant debug myself. I'm just reporting
what the user found

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:


Again I'd assume that the memory allocation for the result is the most 
expensive operation with UnicodeString operands, independent from 
string lengths.




Do you suggest that with UnicodeString - even when using 1 Byte encoding 
types such as  ANSIxxx or UTF-8 -,   the memory allocation is more 
expensive than with the older String handling implementation ?


Please note that I was *not* talking about AnsiStrings.

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Michael Schnell

On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:


Again I'd assume that the memory allocation for the result is the most 
expensive operation with UnicodeString operands, independent from 
string lengths.




Do you suggest that with UnicodeString - even when using 1 Byte encoding 
types such as  ANSIxxx or UTF-8 -,   the memory allocation is more 
expensive than with the older String handling implementation ?


Why ? In fact, the additional some 8 bytes for the Code-Element-Length 
and Code-Type definition (additional to the already existing 
String-Length, Content-Address and Ref-Count DWords) should not matter 
at all.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Hans-Peter Diettrich

Michael Schnell schrieb:


On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
The point is that i would expect a smaller performance hit when 
there's no conversion going on. Something between 10% slower. In the 
cited case is more than 50% slow.
As the "dynamic" types of (most) String Variables are already defined 
and known at compile time, and thus (usually) the library does not need 
to detect the encoding in realtime, the performance hit should be close 
to zero, as long as the same String encoding is used as with the 
non-(DXE-compatible)-Unicode project with the same source code.


Right. Even with RawByteString the test for same encoding should not 
take considerable time (compared with memory allocation...).


Different encodings require to convert both arguments into Unicode, and 
the result back into the target encoding.


OTOH, if the former version used 1-Byte-Strings (ANSI or UTF-8) and the 
new version used 16 or 32 bit Strings (UTF-16 or UTF32) I would expect a 
severe performance hit as well because more bytes need to be moved and 
because the cache gets a lot more tight because of the double memory usage.


Again I'd assume that the memory allocation for the result is the most 
expensive operation with UnicodeString operands, independent from string 
lengths.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Michael Schnell

On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:

Maybe in that example there's going an (unneeded) conversion?
If you use the same string type all over the place it would be a severe 
bug if _any_ conversion is done.


Please check.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-21 Thread Michael Schnell

On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
The point is that i would expect a smaller performance hit when 
there's no conversion going on. Something between 10% slower. In the 
cited case is more than 50% slow.
As the "dynamic" types of (most) String Variables are already defined 
and known at compile time, and thus (usually) the library does not need 
to detect the encoding in realtime, the performance hit should be close 
to zero, as long as the same String encoding is used as with the 
non-(DXE-compatible)-Unicode project with the same source code.


OTOH, if the former version used 1-Byte-Strings (ANSI or UTF-8) and the 
new version used 16 or 32 bit Strings (UTF-16 or UTF32) I would expect a 
severe performance hit as well because more bytes need to be moved and 
because the cache gets a lot more tight because of the double memory usage.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-20 Thread Sergei Gorelkin

20.06.2013 19:31, luiz americo pereira camara пишет:


Maybe in that example there's going an (unneeded) conversion?


This is possible. One needs to profile the example to tell for sure.

Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-20 Thread luiz americo pereira camara
2013/6/20 Sergei Gorelkin :
> 20.06.2013 16:15, luiz americo pereira camara пишет:
>
>> I looked at
>> http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html
>>
>> There's a significant performance drop in fpc trunk
>>
>> Is there anything wrong or this is the expected result?
>>
> Some slowdown is of course the expected result: it is impossible to add all
> codepage stuff without performance impact. Even though conversions happen
> only when codepages differ, the code which checks the codepages is executed
> anyway on every operation.

I know that.

The point is that i would expect a smaller performance hit when
there's no conversion going on. Something between 10% slower. In the
cited case is more than 50% slow.

> The question is, which part of observed slowdown
> is unavoidable and which can be eliminated by more accurate implementation.

Maybe in that example there's going an (unneeded) conversion?

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Performance of string handling in trunk

2013-06-20 Thread Sergei Gorelkin

20.06.2013 16:15, luiz americo pereira camara пишет:

I looked at http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html

There's a significant performance drop in fpc trunk

The difference of generated code is a call to fpc_ansistr_assign and a
different implementation of fpc_AnsiStr_Concat

AFAIK there should be significant performance hit only when assigning
string with different code pages. It does not seem to be the case.

Is there anything wrong or this is the expected result?

Some slowdown is of course the expected result: it is impossible to add all codepage stuff without 
performance impact. Even though conversions happen only when codepages differ, the code which checks 
the codepages is executed anyway on every operation. The question is, which part of observed 
slowdown is unavoidable and which can be eliminated by more accurate implementation.


Regards,
Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel