Re: [Lazarus] String vs WideString
Am 17.08.2017 04:16 schrieb "wkitty42--- via Lazarus" < lazarus@lists.lazarus-ide.org>: > > On 08/16/2017 06:46 PM, Luca Olivetti via Lazarus wrote: >> >> I started using strings as communication buffers since delphi 2. There >> weren't even dynamic arrays then... > > > really? delphi came from TP/BP... i was (still am, actually) using dynamic arrays in TP6 ;) Dynamic arrays in the form of "array of Type" were only introduced in Delphi 3 if I remember correctly. Anything before that needed manual memory management. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 08/16/2017 06:46 PM, Luca Olivetti via Lazarus wrote: I started using strings as communication buffers since delphi 2. There weren't even dynamic arrays then... really? delphi came from TP/BP... i was (still am, actually) using dynamic arrays in TP6 ;) -- NOTE: No off-list assistance is given without prior approval. *Please keep mailing list traffic on the list unless* *a signed and pre-paid contract is in effect with us.* -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 08/16/2017 07:30 PM, Graeme Geldenhuys via Lazarus wrote: On 2017-08-16 18:35, Sven Barth via Lazarus wrote: You are wrong. The string types in 3.0.x and 3.1 are like this: Thanks for correcting me. I was thinking of the "$modeswitch unicodestring" option. will that modeswitch take care of the warning about explicit conversion between ansistring and unicode string when one has var foo : unicodestring; writeln(padright(foo,5); ?? i wrote a quick and simple little array exhibit program for someone... i had thought to try to embrace this new unicode stuff by using unicode strings... the using the padright and similar string manipulators gave me warnings about ansistring conversions :? NOTE: this may be because i have an older lazarus and fpc installed... lazarus fixes 1.6.1 and fpc fixes 3.0.something... -- NOTE: No off-list assistance is given without prior approval. *Please keep mailing list traffic on the list unless* *a signed and pre-paid contract is in effect with us.* -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-16 23:46, Luca Olivetti via Lazarus wrote: I started using strings as communication buffers since delphi 2. There weren't even dynamic arrays then... Well, Link-Lists existed from the beginning of time. I used them plenty in my TP days, and adding, inserting, indexing etc was pretty easy. Maybe programmers have just become spoilt over time with all the "out of the box" functionality and actually become lazy in coding. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-16 19:26, Luca Olivetti via Lazarus wrote: I mean, TBytes is just an "array of char". NO! Char can now mean a 1-byte char or a 2-byte char (I don't know how FPC plans to support Unicode surrogate pairs which will require 4-bytes). In the olden days (Delphi 7 and FPC 2.6.4) the Char type might always have meant 1-byte, but it doesn't necessarily these days. TBytes has always been a container for Byte data. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-16 18:35, Sven Barth via Lazarus wrote: You are wrong. The string types in 3.0.x and 3.1 are like this: Thanks for correcting me. I was thinking of the "$modeswitch unicodestring" option. Regards, Graeme -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 20:26, Luca Olivetti via Lazarus wrote: > El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit: > >> In hind sight, using TBytes or TMemoryStream and it would have been >> very clear that it is a storage container for byte sized data, and no >> automatic conversion (by the compiler) would be done to data stored in >> such containers. > > Call me lazy but I don't want to reinvent the wheel and re-implement > from scratch the functionality that a plain ansistring provides and > TBytes to this day doesn't. > I mean, TBytes is just an "array of char". I can't (easily) add a byte > to the end, cut a slice of the bytes, find one byte in the array, etc. > OK, I can, but I have to program it all by myself while a string does > all that and more and probably it's a lot more efficient. Trunk supports Insert() and Delete() on dynamic arrays, Concat() and + are on the near term ToDo list. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 20:44, Juha Manninen via Lazarus wrote: So using "char" (the type) as reference to "codepoint" is something we have to do, because today the type "char" is for codepoints. Sorry I didn't understand this one. "Char" (the type) holds a codeunit, not a codepoint. Char is either 1 Right yes. Genuine mistake with all the confusion -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, Aug 16, 2017 at 7:53 PM, Martin Frb via Lazaruswrote: >> I know CodeUnit and CodePoint are not called "character" officially by >> the Unicode Standard. >> They however are called "character" in normal communication. > > And that is where the problem starts. > ... Exactly. Discussions where the word "character" is used are very vague and inaccurate. > So using "char" (the type) as reference to "codepoint" is something we have > to do, because today the type "char" is for codepoints. Sorry I didn't understand this one. "Char" (the type) holds a codeunit, not a codepoint. Char is either 1 byte or 2 bytes depending on if it maps to AnsiChar or WideChar, for UTF-8 or UTF-16 respectively. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 16/08/17 a les 20:26, Luca Olivetti via Lazarus ha escrit: El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit: In hind sight, using TBytes or TMemoryStream and it would have been very clear that it is a storage container for byte sized data, and no automatic conversion (by the compiler) would be done to data stored in such containers. Call me lazy but I don't want to reinvent the wheel and re-implement from scratch the functionality that a plain ansistring provides and TBytes to this day doesn't. I mean, TBytes is just an "array of char". I can't (easily) add a byte to the end, cut a slice of the bytes, find one byte in the array, etc. OK, I can, but I have to program it all by myself while a string does all that and more and probably it's a lot more efficient. Not to mention that its index starts from 0. If I wanted to program in C I would be programming in C, not pascal ;-) Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit: In hind sight, using TBytes or TMemoryStream and it would have been very clear that it is a storage container for byte sized data, and no automatic conversion (by the compiler) would be done to data stored in such containers. Call me lazy but I don't want to reinvent the wheel and re-implement from scratch the functionality that a plain ansistring provides and TBytes to this day doesn't. I mean, TBytes is just an "array of char". I can't (easily) add a byte to the end, cut a slice of the bytes, find one byte in the array, etc. OK, I can, but I have to program it all by myself while a string does all that and more and probably it's a lot more efficient. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote: > On 2017-08-16 09:43, Michael Schnell via Lazarus wrote: >> IMHO, any implementation of TStrings that forces a conversion (just >> because the class uses TStrings and not due to a logical demand), is a >> contradiction to providing code aware strings at all. > > But in FPC 3.x (using modern compiler modes - not TP or Mac) String = > UnicodeString. So it makes sense that TStrings should use UnicodeString > internally to store its data. The Unicode standard is also the only > standard that can support any language. So all Windows code-pages can be > supported with the single UnicodeString type. You are wrong. The string types in 3.0.x and 3.1 are like this: TP, Iso, ExtPas, MacPas, FPC, ObjFPC (or below modes with $H-): String = ShortString Delphi (or other modes with $H+): String = AnsiString (or more precisely String(CP_ACP), meaning the system codepage) Delphi_Unicode (or other modes with $H+ and $modeswitch unicodestring): String = UnicodeString Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 10:34, Tony Whyman via Lazarus wrote: > On 14/08/17 17:47, Sven Barth via Lazarus wrote: >> The main problem of such a dynamic type would be the inability to do >> fast indexing as the compiler would need to insert runtime checks for >> the size of a character. I had already thought the same, but then had >> to discard the idea due to this. > > Is this really a big problem? It is not as if it would be necessary to > do a table lookup everytime you index a string as the indexing method > could be an attribute of the string and updated with the character > encoding attribute. Is it really that complicated for the compiler to > generate code that jumps to an indexing method depending upon a data > attribute? In a tight loop where one accesss the string character by character (take Pos() for example) this will lead to a significant slowdown as the compiler (without optimizations) will have to insert a call to the lookup function for each access. While I generally don't consider performance degradation as a backwards compatibility issue I do in this case, due to the significant decrease in performance. Take this evaluation example: === code begin === program tperf; {$mode objfpc}{$H+} uses SysUtils; function lookup(const aStr: String; aIndex: SizeInt): Char; begin Result := aStr[aIndex]; end; var str: String; starttime, endtime: TDateTime; i, j: LongInt; begin SetLength(str, 1); starttime := Now; for i := 0 to 1 do for j := 1 to Length(str) do if str[j] <> '' then ; endtime := Now; Writeln('Direct: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime)); starttime := Now; for i := 0 to 1 do for j := 1 to Length(str) do if lookup(str, j) <> '' then ; endtime := Now; Writeln('Lookup: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime)); end. === code end === === output begin === Direct: 00:00:01.766 Lookup: 00:00:02.061 === output end === While this example is of course artificial it nevertheless shows the slow down. > Is your problem really more about the result type as, depending on the > character width, the result could be an AnsiChar or WideChar or a UTF8 > character for which I don't believe there is a defined char type (other > than an arguable mis-use of UCS4Char)? That is indeed also a problem. I might not have had that one in mind with my mail above, but I did back then when I had brainstormed this. > I can accept that a clear up of this area would also have to extend to > the char types as well - but I would also argue that that is well > overdue. On a quick count, I found 7 different char types in the system > unit. And most important of all: any solution that is developed *MUST* be backwards compatible, so that means that in the least that type aliases would remain anyway. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 16:55, Juha Manninen via Lazarus wrote: On Wed, Aug 16, 2017 at 6:24 PM, Martin Frb via Lazaruswrote: Actually no. I know CodeUnit and CodePoint are not called "character" officially by the Unicode Standard. They however are called "character" in normal communication. And that is where the problem starts. As long as people do this, even if they know it is incorrect, others will pick it up, and others will learn the wrong concepts. Calling codepoints = char, means that newcomers will think s[x] is a valid way to deal with chars. And that is wrong, even in utf32. For example in the "String vs WideString" thread most people used "character" as a synonym for CodePoint. Lots of people used the word character as if they where the same as codeunit. But the questions is did they use it as synonym? I.e did they know they were substituting with the wrong word? If so, why would they intentionally use misleading terms? For CodeUnit the term is very logical for historical reasons as the type "Char" is a short form of "Character". That is why today it is a misnomer. So using "char" (the type) as reference to "codepoint" is something we have to do, because today the type "char" is for codepoints. That is different from the English word "char" and that can cause a huge confusion. The English word "character" however is unambitious. It is not the name of a type. So it refers to character only, not to codepoint. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, 16 Aug 2017 18:06:36 +0200 Michael Schnell via Lazaruswrote: >[...] > The only difference to the current status is that with the "dynamic" > string brand the content of the "bytes per element" field is not > predefined by the variable declaration but can change when something is > assigned to that (additional) brand of string variables (I feel that > this is clearly stated in the paper). Hence for that (additional) brand > of string variables the compiler needs to generate code to read this > field when implementing the built-in functions. This "dynamicstring" sounds like Rawbytestring times two. Any function accessing the inner chars of a "dynamicstring" has to handle Rawbytestring codepages and unicodestring and array of byte/word/dword. If this is the price for avoiding some conversions, many programmers will become unhappy. Michael, please tell me your proposal has some serious advantages. I don't see them. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 17:55, Juha Manninen via Lazarus wrote: although Pos(), Copy() and Length() deal with CodeUnit resolution. I wonder how the new fancy string types would handle it without a performance penalty. This again is not in the scope of the paper, and supposed to stay as it is. S[x], Pos(), and friends work in terms of "bytes per element" bytes. The only difference to the current status is that with the "dynamic" string brand the content of the "bytes per element" field is not predefined by the variable declaration but can change when something is assigned to that (additional) brand of string variables (I feel that this is clearly stated in the paper). Hence for that (additional) brand of string variables the compiler needs to generate code to read this field when implementing the built-in functions. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 17:20, Juha Manninen via Lazarus wrote: Unicode is the standard now. We cannot ignore it, and we don't want to ignore it because it solves so many problems of the earlier solutions. If you create a new string type, you certainly must take Unicode into account. It is not "ignored", as it is handled by the conversion functions the functionality of which is not touched. The paper is just about storing the information in the strings (including the "encoding brand" and "bytes per element") fields. So the actual meaning of the stuff that is stored in the strings is beyond the scope of the paper. And supposed to stay as it currently is. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 16:20, Juha Manninen via Lazarus wrote: The word "character" in Unicode can mean: 1. CodeUnit — Represented by Pascal type "Char". Actually no. It can overlap. But a codeunit is NOT a character. For example a codeunit that holds a codepoint of class "combining mark", this is not a character. It is just something that can form a character if combined with other codepoints. 2. CodePoint Also not a character. Same as above. Some Codepoints happen to also be a character. But some are not. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, Aug 16, 2017 at 4:49 PM, Michael Schnell via Lazaruswrote: >> You are writing about encodings etc. which are part of codepoints, but >> you call them "characters". Why? > > Because the type for this stuff used in Delphi and and FPC is called "char". No, actually the Pascal type "Char" contains a CodeUnit, not CodePoint. It is the smallest fixed width "atom" of Unicode text. It is still extremely useful in Unicode related programming. The word "character" in Unicode can mean: 1. CodeUnit — Represented by Pascal type "Char". 2. CodePoint — all the arguments about one encoding's supremacy over another deal with CodePoints. Yes, UTF-8, UTF-16, UTF-32 etc. all only encode CodePoints. 3. Abstract Unicode character — like 'WINE GLASS'. 4. Coded Unicode character — "U" + a unique number, like U+1F377. This is what "character" means in Unicode Standard. 5. User-perceived character — Whatever the end user thinks of as a character. This is language dependent. For instance, ‘ch’ is two letters in English but one letter in Czech and Slovak. Many more complexities are involved here, including decomposed codepoints. 6. Grapheme cluster 7. Glyph — related to fonts. So, number 4. is the official Unicode "character". Otherwise the most useful meanings are 1. "CodeUnit" for programmers and 5. "User-perceived character" for everybody else. Note, CodePoint is NOT a useful meaning for "character". It would only confuse things. Yet most people in these Unicode threads write about "character" like it meant CodePoint. It can only mean that those people are ignorant of the complexity of Unicode. :( > In fact I did not explicitly talk about Unicode at all. the paper says it: > ... Unicode is the standard now. We cannot ignore it, and we don't want to ignore it because it solves so many problems of the earlier solutions. If you create a new string type, you certainly must take Unicode into account. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 11:37 AM, Juha Manninen via Lazaruswrote: > On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazarus > wrote: >> Thanks. I know about this page... unfortunately looks like it is not >> enough, since many others still complain. > > What is missing? I can try to improve it. I cannot say from others, but I had this issue (about WideString) for now. >> This thread is not only about WinAPI. I have this problem because I >> need to use a Windows 3rd Lib, which uses WideString. > > Then just use WideString or UnicodeString where needed. It is not a problem. Are you saying that I need to do this? (following the firt example on this thread) === begin === var U: UnicodeString; W: WideString; begin U := IniFile.ReadString('TheLib', 'license', ''); W := U; Lib.SetLicense(W); // ... end; === end === ...and I will not get a "Warning", right? > Note, WideString is for OLE programming. Most often you should use > UnicodeString. Their memory management differs. Ok... thanks... but in my case is a OLE object that I need to use. Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 13:48, Michael Schnell wrote: On 16.08.2017 14:30, Martin Frb via Lazarus wrote: And that would still not be "char", but "codepoint" A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). Unfortunately in Delphi and FPC the appropriate work-alike existing type is called Char (with certain extensions). It would cause major problems to drop that name for something else, even if that would be appropriate. I agree. "char" actually is a "code unit". But renaming it, would probably be as good as killing the language. and anyone can do type codeunit=char; and use this. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 13:37, Alexey via Lazarus wrote: On 16.08.2017 15:30, Martin Frb via Lazarus wrote: A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). See my prev post: i see that each S[i] good to be like QWord (sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be TextString. internally it can be compressed to utf8. TextString is good if i want to parse text by "chars". If "char" needs more bytes- lets take more (internally it is same utf8) Have a look at https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_longest_unicode_character/ There is ONE character, that comprises more than 200 codepoints. Only way to store such a char is in a type of dynamic size (aka string) Well I couldn't find an official doc what makes the boundaries of a char. But as far as I can see: if ä is one character, and it can be encoded as "none combining codepoint" + "combining codepoint", then a character is any sequence of one "none combining codepoint" + zero or more "combining codepoints" (AFAIK Arabic scripts has chars, that have several "combining codepoints", so this is happening in actual languages. The example as far as I checked fulfils this definition. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazaruswrote: > Thanks. I know about this page... unfortunately looks like it is not > enough, since many others still complain. What is missing? I can try to improve it. > This thread is not only about WinAPI. I have this problem because I > need to use a Windows 3rd Lib, which uses WideString. Then just use WideString or UnicodeString where needed. It is not a problem. Note, WideString is for OLE programming. Most often you should use UnicodeString. Their memory management differs. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 6:12 AM, Juha Manninen via Lazaruswrote: > On Mon, Aug 14, 2017 at 4:11 PM, Marcos Douglas B. Santos via Lazarus > wrote: >> Unicode everywhere and you using AnsiString and doing everything... >> Now I'm confused. > > Yes, please read: > http://wiki.freepascal.org/Unicode_Support_in_Lazarus > I have advertised it so much that some people are already irritated, > but maybe you missed it so far. Thanks. I know about this page... unfortunately looks like it is not enough, since many others still complain. >> This is a ugly trick... but I understood what you mean. > > This was about the explicit temporary UnicodeString variable for > WinAPI call parameters. > No, it is not ugly, the code remains 100% compatible with Delphi. > Please remember also that direct WinAPI call are not needed in > cross-platform code. This thread is not only about WinAPI. I have this problem because I need to use a Windows 3rd Lib, which uses WideString. Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 15:33, Juha Manninen via Lazarus wrote: Why don't you implement such a system. This is all FOSS, free and open source. I would never dare to try to edit the compiler :-[ You are writing about encodings etc. which are part of codepoints, but you call them "characters". Why? Because the type for this stuff used in Delphi and and FPC is called "char". Is it possible you don't know Unicode beyond codepoints? In fact I did not explicitly talk about Unicode at all. the paper says it: "In this article, a "String" is thought of as a reference counted ordered array of a number of "Things" (aka elements). (I feel that this is what the name String suggests.)" ..."If the elements of the strings are printable characters or partial codes of UTF. OK, this is nice (provided the conversion functions are in place) and makes doing programs handling conventional problems very easy" ... Do you have plans to tackle also the complex issues of Unicode? Not at all. If not, then your efforts are useless because codeunits and codepoints are easy in any case. I know. The intention was to handle a completely different problem from that you suggest here. You use energy for a problem that does not exist. I wrote the paper because I once was requested to do so in the fpc forum. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, Aug 16, 2017 at 2:47 PM, Michael Schnell via Lazaruswrote: > -Michael (It's rather frustrating to discuss that obviously never will > happen :-() Why don't you implement such a system. This is all FOSS, free and open source. You are writing about encodings etc. which are part of codepoints, but you call them "characters". Why? Is it possible you don't know Unicode beyond codepoints? Do you have plans to tackle also the complex issues of Unicode? If not, then your efforts are useless because codeunits and codepoints are easy in any case. You use energy for a problem that does not exist. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, Aug 16, 2017 at 3:37 PM, Alexey via Lazaruswrote: > See my prev post: i see that each S[i] good to be like QWord (sizeof(one > char)= sizeof(Qword)). It can be TextChar. And type can be TextString. > internally it can be compressed to utf8. TextString is good if i want to > parse text by "chars". If "char" needs more bytes- lets take more > (internally it is same utf8) No Alexey, you are now explaining codepoints. Codeunits and codepoints are the easy part in any case. Could you please define character. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote: For some unknown reason you want to store different encodings in a TStrings and fear the "time-consuming" and loss-prone auto conversions. It's obvious that a user using a different encoding brand in a string var than that suggested by TStrings (UTF-8 in fpc, UTF-16 in Delphi) implicitly triggers auto-conversion when handling the string. This has several consequences. It might be a really good idea when e.g. doing some code that in a loop needs certain operation that might be very fast with UTF-16 but TStringList would store the data in a more compact way. It might be time consuming when the conversion is done without being necessary. It might be error pone when the user stores some random stuff in the string that is not able to be handled by the conversion forth and back. In any case all this happens without the user being aware of, which might cause frustration. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote: Not if complicated things get more complicated. Please leave out the additional encoding brands suggested just as an afterthought in the paper. These are not the purpose at all but ()if the other stuff would be in place) just com as a natural enhancement. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] GlobalMemoryStatus is Windows only, how to get installed RAM on Linux ?
On Wed, 16 Aug 2017, Landmesser John via Lazarus wrote: googled in vain ... ... and "TsmBios" ( -> Win/Linux https://github.com/RRUZ/tsmbios ) won't compile :-( So how to get Information about installed RAM on Linux for example? Ok, i could grep "hwinfo" or such in a terminal but thats not what i'm looking for. Your best options is most likely to read /proc/meminfo and parse the result. It contains a wealth of information. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
[Lazarus] GlobalMemoryStatus is Windows only, how to get installed RAM on Linux ?
googled in vain ... ... and "TsmBios" ( -> Win/Linux https://github.com/RRUZ/tsmbios ) won't compile :-( So how to get Information about installed RAM on Linux for example? Ok, i could grep "hwinfo" or such in a terminal but thats not what i'm looking for. Tipps are welcome -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 14:22, Alexey via Lazarus wrote: BTW, it will be good to have "Cstring" (or another name, not "dynamicstring") : ... You are missing the point the paper is supposed to be about: enhancing the versatility of the library functions such as those using TStrings. Not just creating another type of strings, which is nothing but a prerequisite for the main purpose. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, 16 Aug 2017 15:22:20 +0300 Alexey via Lazaruswrote: > On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote: > > When you propose a new string type "dynamicstring" you have to define these > > operators. > > BTW, it will be good to have "Cstring" (or another name, not > "dynamicstring") : > > - [] operator is 0-based like Python/C > > - s[i] is DWORD per char (for all Unicode chars from 0 to MaxDWORD codes) > > PChar(s)/PWChar(s) wont work for it? so it is not ok idea? But this type > can be compressed inside, eg in utf8. S[i] is DWORD outside. It is like > some class. This sounds, as if you want an UTF-32 string type. Michael's proposal is a multi encoded string type, storing Ansi, UTF-8, UTF-16 and UTF-32. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 14:30, Martin Frb via Lazarus wrote: And that would still not be "char", but "codepoint" A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). Unfortunately in Delphi and FPC the appropriate work-alike existing type is called Char (with certain extensions). It would cause major problems to drop that name for something else, even if that would be appropriate. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, 16 Aug 2017 13:47:26 +0200 Michael Schnell via Lazaruswrote: > On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote: > > You are confusing people if you name your encodings like this. > There also is no "official" Code pages named "Default" or "None", the > naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero. It is not about "official". A codepage describes a character set. What has your CP_QWORD to do with any character set? >[...] > > What is the intention of your proposal? > > That is given in the instructional paragraph "The problem": > "The most obvious candidate for pain on that behalf is “TStrings”. I read it, but I must admit, I don't understand it. For some unknown reason you want to store different encodings in a TStrings and fear the "time-consuming" and loss-prone auto conversions. And then it sounds as if this is a common problem ("much more urgent"). >[...] > Enhancing the count of available encoding brandings is just a logical > consequence of a less problem prone and more versatile (not implicitly > restricted to printable text) overall string handling. Who wants to have more encodings? AFAIK everyone wants less, preferably only one. > -Michael (It's rather frustrating to discuss that obviously never will > happen :-() Not if complicated things get more complicated. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 15:30, Martin Frb via Lazarus wrote: A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). See my prev post: i see that each S[i] good to be like QWord (sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be TextString. internally it can be compressed to utf8. TextString is good if i want to parse text by "chars". If "char" needs more bytes- lets take more (internally it is same utf8) -- Regards, Alexey -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16/08/2017 10:51, Mattias Gaertner via Lazarus wrote: Of course an appropriate "char" type for each string encoding brand could to be provided, hence a "CP_QWord Char" as an alias or a QWord. There is no QWord codepage. That would be confusing. And that would still not be "char", but "codepoint" A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote: When you propose a new string type "dynamicstring" you have to define these operators. BTW, it will be good to have "Cstring" (or another name, not "dynamicstring") : - [] operator is 0-based like Python/C - s[i] is DWORD per char (for all Unicode chars from 0 to MaxDWORD codes) PChar(s)/PWChar(s) wont work for it? so it is not ok idea? But this type can be compressed inside, eg in utf8. S[i] is DWORD outside. It is like some class. -- Regards, Alexey -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote: You are confusing people if you name your encodings like this. There also is no "official" Code pages named "Default" or "None", the naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero. So I did the same and just brainlessly extended the existing "CP..." naming scheme. Your "dynamicstring" supports char, widechar, byte, word, dword, qword. Why not shortint or smallint? Why not boolean, single and variant? As pointed out this is just a draft of a proposal, prone to enhancement and improvement. What is the intention of your proposal? That is given in the instructional paragraph "The problem": "The most obvious candidate for pain on that behalf is “TStrings”. Only a fully dynamically encoded version of TStrings and friends would allow for a solution for many string encoding related problems, as the user can't modify the string encoding brand TStrings uses and hence will face the described problems when he uses TStrings with all but one of the String encoding brandings he can choose from. Enhancing the count of available encoding brandings is just a logical consequence of a less problem prone and more versatile (not implicitly restricted to printable text) overall string handling. -Michael (It's rather frustrating to discuss that obviously never will happen :-() -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On Wed, 16 Aug 2017 12:24:55 +0200 Michael Schnell via Lazaruswrote: > On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote: > > Every Delphi/FPC type has a bunch of operators. Strings support :=, =, > > <>, >=, <= and [] for read and write. > > When you propose a new string type "dynamicstring" you have to define these > > operators. >[...] > For "new" encoding brandings, such as CP_Byte, CP_Word, CP_DWord, > CP_QWord, the working of the operators is obvious. There are no such codepages. You are confusing people if you name your encodings like this. > It somebody tries to > compare a printable Text string with a string of binary elements, maybe > the behavior is undefined. Your "dynamicstring" supports char, widechar, byte, word, dword, qword. Why not shortint or smallint? Why not boolean, single and variant? What is the intention of your proposal? Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-16 11:05, Juha Manninen via Lazarus wrote: Unfortunately many other programmers had the same wrong idea or they were just lazy. The result anyway is a lot of broken UTF-16 code out there. Yeah, I see that even in commercial products and projects. It's very sad to see. Hence I always promote UTF-8, and you can't get it wrong as easily as UTF-16. No endianess to worry about, no surrogate pairs and UTF-8 is ready for streaming (network or disk) out of the box. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 12:22, Juha Manninen via Lazarus wrote: You should stop writing in this thread now. I agree with Mattias. I perfectly agree with you. But you can't blame me for answering when asked. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote: Every Delphi/FPC type has a bunch of operators. Strings support :=, =, <>, >=, <= and [] for read and write. When you propose a new string type "dynamicstring" you have to define these operators. That is easily doable. The definition of := is discussed in the paper. (Only for := there is no accessible encoding definition for the left operand.) If the encoding branding is one of those that already exist, the current definition is used. For "new" encoding brandings, such as CP_Byte, CP_Word, CP_DWord, CP_QWord, the working of the operators is obvious. It somebody tries to compare a printable Text string with a string of binary elements, maybe the behavior is undefined. There is no QWord codepage. That would be confusing. Of course the term "Codepage" Embarcadero chose for the encoding identification is misleading in this context. That is why in the said paper it's called "encoding style" (which is not a really appropriate wording, either, but hey, it's just an initial suggestion and not yet a final documentation, and it had been clear from the beginning that it's in vain, anyway. ) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 12:12 PM, Michael Schnell via Lazaruswrote: > UTF-8 and UTF-16 are just encoding variants for 32 bit Unicode "characters", > storing them in n (or 2*n) Bytes according to a simple scheme. No, they are encodings for codepoints, not "characters" (whatever that means). Michael Schnell, your posts are completely out of topic. Unicode related topics clearly pull you like a magnet and then you loose all control and start to proclaim your grand plan for a string revamp. It can continue for months as we remember from past years. You should stop writing in this thread now. I agree with Mattias. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 11:55, Mattias Gaertner via Lazarus wrote: 1,114,112 possible code points need at most 21 bits. Due to encoding at most 32bit. Sorry. Typo. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
[Lazarus] OpenGL 4.6 bindings and generator
Hi all, After finding the OpenGL bindings that come with Lazarus a bit on the ancient side of things (i think it only supports up to 4.0? Also there is a 4.3 version loading function but only seems to call 3.3's loader - ignoring 4.0 - and loads only a single extension) and never really liking the global functions of pointers approach (if nothing else it makes the autocompletion in the IDE a bit annoying) i decided to make some brand new bindings. I wrote a parser for Khronos' XML spec (gl.xml) that generates the appropriate interface and implementation. You only have to call LoadGLProcs after you have a context ready and it tries to load everything it knows of. Instead of Load_some_extension you get a global Has_some_extension variable (these are initialized via LoadGLProcs too). As a bonus you get a HasExtension function as well as an AllExtensions array of strings. Btw it is not a drop in replacement for GL/GLext/GLotherstuff, although you can use it with the OpenGL control that comes with Lazarus and most likely it is compatible as long as you don't use it from the same unit (since all they do is call the driver stuff anyway) and you initialize both using the same context (since different contexts might give different functions). You can find the code as well as a pregenerated "OpenGL" unit at: http://runtimeterror.com/rep/gl2unit/index At the moment only Windows is supported, but soon i'll add Linux and eventually Mac OS X support (it should be around ~10 lines of code for each OS, hopefully). Also i have done minimal testing so there might be bugs :-). Kostas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Mon, Aug 14, 2017 at 4:21 PM, Tony Whyman via Lazaruswrote: > UTF-16/Unicode can only store 65,536 characters while the Unicode standard > (that covers UTF8 as well) defines 136,755 characters. > UTF-16/Unicode's main advantage seems to be for rapid indexing of large > strings. That shows complete ignorance from your side about Unicode. You consider UTF-16 as a fixed-width encoding. :( Unfortunately many other programmers had the same wrong idea or they were just lazy. The result anyway is a lot of broken UTF-16 code out there. On Tue, Aug 15, 2017 at 12:15 PM, Tony Whyman via Lazarus wrote: > If a topic keeps on being discussed after 10+ years of argument, the reason > is usually either (a) the problem and its solution have not been documented > properly, or (b) the outcome is an unsatisfactory compromise. Or (c) The people discussing are ignorant about the topic. > I went back and read the wiki article you mentioned and was no more the > wiser as to why the current mess exists. Is it really no more than because > Delphi continues to screw up in this area, so must FPC? The body of the > article appears to be a set of notes - not necessarily wrong in themselves > but lacking the background and context needed to explain why it is like it is. Hmmm... Originally the page was a mess because it had lots of irrelevant background info about the old obsolete LCL Unicode support. Text was added by many people but none was removed. Finally I cleaned the page. It now has most relevant info at the top and then special cases and technical details later. I am rather happy with the page now, it explains how to use Unicode with Lazarus as clearly as possible. However I am willing to improve it. What kind of background and context would you need? > 1. Stop using the term "Unicode". You can stop using it. No problem. For others however it is a well defined international standard. See: https://en.wikipedia.org/wiki/Unicode > 2. Clean up the char type. > ... > Why shouldn't there be a single char type that intuitively represents > a single character regardless of how many bytes are used to represent it. What do you mean by "a single character"? A "character" in Unicode can mean about 7 different things. Which one is your pick? This question is for everybody in this thread who used the word "character". > Yes, in a world where we have to live with UTF8, UTF16, UTF32, legacy code > pages and Chinese variations on UTF8, that means that dynamic attributes > have to be included in the type. But isn't that the only way to have > consistent and intuitive character handling? What do you mean? Chinese don't have a variation of UTF8. UTF8 is global unambiguous encoding standard, part of Unicode. The fundamental problem is that you want to hide the complexity of Unicode by some magic String type of a compiler. It is not possible. Unicode remains complex but the complexity is NOT in encodings! No, a codepoint's encoding is the easy part. For example I was easily able to create a unit to support encoding agnostic code. See unit LazUnicode in package LazUtils. The complexity is elsewhere: - "Character" composed of codepoints in precomposed and decomposed (normalized) forms. - Compare and sort text based on locale. - Uppercase / Lowercase rules based on locale. - Glyphs - Graphemes - etc. I must admit I don't understand well those complex parts. I do understand codeunits and codepoints, and I understand they are the easy part. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, 16 Aug 2017 11:33:04 +0200 Michael Schnell via Lazaruswrote: >[...] > But in fact "Unicode" is just a universal standard defining 64 bit > entities. No. 1,114,112 possible code points need at most 21 bits. Due to encoding at most 32bit. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 11:32, Mattias Gaertner via Lazarus wrote: Anyone who wants to discuss the grand picture of strings in FPC for the millionth time should start a new topic. Right you are. And it will be by far too late and futile, anyway, because of the reasons discussed a million times. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote: Are you suggesting that internally TStrings should have different storage for all possible languages, Not at all. In the said paper I point out that a new fully dynamical string encoding brand would be introduced and same is used for TStrings. Everything else will not provide an improvement of the class of problems under discussion since years. -Michael (knowing that this will never happen) -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote: So it makes sense that TStrings should use UnicodeString internally to store its data. The Unicode standard is also the only standard that can support any language. But in fact "Unicode" is just a universal standard defining 64 bit entities. The encoding of those varies: UTF-8, UTF-16 high byte first, UTF-16 low byte first, 64 bit low byte first, 64 bit high byte first, fpc and Delphi do support several of those as a string encoding (and with that crating any number of problems). -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, 16 Aug 2017 11:09:17 +0200 Michael Schnell via Lazaruswrote: > On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote: > > This thread is going out of topic. > > Please start a new thread if you want to discuss Delphi strings. > You can't discuss fpc's string problems without mentioning Delphi, as > they are a direct consequence as well of Delphi-compatibility as of > Delphi-incompatibility. The original post was about a string conversion warning. Anyone who wants to discuss the grand picture of strings in FPC for the millionth time should start a new topic. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote: This thread is going out of topic. Please start a new thread if you want to discuss Delphi strings. You can't discuss fpc's string problems without mentioning Delphi, as they are a direct consequence as well of Delphi-compatibility as of Delphi-incompatibility. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-16 09:43, Michael Schnell via Lazarus wrote: IMHO, any implementation of TStrings that forces a conversion (just because the class uses TStrings and not due to a logical demand), is a contradiction to providing code aware strings at all. But in FPC 3.x (using modern compiler modes - not TP or Mac) String = UnicodeString. So it makes sense that TStrings should use UnicodeString internally to store its data. The Unicode standard is also the only standard that can support any language. So all Windows code-pages can be supported with the single UnicodeString type. Are you suggesting that internally TStrings should have different storage for all possible languages, or some RawByteString type? So if you load some non-Latin code-page text internally it still stores that text as that non-Latin bytes? That would just over-complicate the TStrings class. FPC is moving towards UnicodeString being used internally for everything in the RTL, so why must TStrings be any different. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, 16 Aug 2017, Michael Schnell via Lazarus wrote: On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote: How is that not "abuse"??? IMHO it's a major shortcoming to define "string" as "printable text". On the contrary. That is exactly what it means. Anything else is just a collection of bytes. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote: How is that not "abuse"??? IMHO it's a major shortcoming to define "string" as "printable text". In fact the name "String" does not suggest this at all. A "string" in my understanding just is a sequence of similar "things". A string type was definitely not the right choice. Notwithstanding the discussion about the mere wording, this only would hold, if the system would provide a differently named non "printable text" basic type that comes with the features needed for such usage: reference counting, lazy copy, simple operators for concatenating and element extraction and replacement, built-in function for substring locating, ... -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 8:53 AM, Bo Berglund via Lazaruswrote: > Based on this experience I wanted to alert the OP of the fact that > using AnsiString instead of string is not a cure-all for binary data, > you need to fix the codepage too, which is what the RawByteString does > for you Bo, everybody has known for decades that AnsiString is not for binary data. Why do you proclaim it as a new discovery? The OP's problem was completely different. It was about text encoding. TBytes is clearly the right choice for your binary data, but this discussion is not about binary data! What means "AnsiString instead of string"? String is typically an alias for AnsiString. Your sentence about RawByteString is also wrong. There is no automatic codepage conversion for RawByteString. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, 16 Aug 2017 10:47:37 +0200 Michael Schnell via Lazaruswrote: > On 15.08.2017 19:29, Luca Olivetti via Lazarus wrote: > > I has worked extremely well and reliably until fpc 2.6.4 (i.e. with > > string=ansistring). > > Does it not work in 3.x? > I understand that storing uncoded Bytes in UTF8-Strings (hence in fpc) > works as good as it always had, as long as all strings are defined with > the same code branding as TSrings (and friends) is (i.e. UTF8), because > there never will be a conversion. > > But it does not work in Delphi, as here TStrings is defined to be UTF-16. This thread is going out of topic. Please start a new thread if you want to discuss Delphi strings. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 21:38, Ondrej Pokorny via Lazarus wrote: Furthermore, if you use(d) strings for binary data, just replace old string for AnsiString/RawByteString (and Char for AnsiChar, PChar for PAnsiChar) and you are good to go. Annoying but no big deal. This only works if all tools that you use do the same. And a major tool for handling strings is TStrings and it's siblings. You hardly an avoid using same. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 19:18, Graeme Geldenhuys via Lazarus wrote: Why can't that be changed to a UnicodeString or UTF8String IMHO, any implementation of TStrings that forces a conversion (just because the class uses TStrings and not due to a logical demand), is a contradiction to providing code aware strings at all. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] The new kid is growing up fast
On 15.08.2017 21:40, Ondrej Pokorny via Lazarus wrote: Too bad that Eugene didn't decide to improve Lazarus Cocoa bindings :) Does he use fpc as a compiler ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus