Re: [Lazarus] Converting all code to use UnicodeString
On 27/09/17 09:16, Graeme Geldenhuys via Lazarus wrote: > On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote: >> A constant that can change... > > > Yeah, that concept still blows my mind. [figuratively speaking] They > should shoot the developer that came up with that idea - and the team > leader that approved it. > > Regards, Graeme > comp.compilers.free-pascal.social is leaking ;) It dates back to when, Turbo Pascal ? Late 1980s / Early 1990s ? Imagine this: Developer (thinking): "The rave was great last weekend, still feeling the pain Thursday" Developer: " we have this almost ready and this looks like a great idea" Supervisor (thinking): "Ah the world's going to end next week anyway, who cares" Supervisor: "OK, make it so" ;) -L. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Wed, Sep 27, 2017 at 7:05 AM, Juha Manninen via Lazarus wrote: > On Tue, Sep 26, 2017 at 10:52 PM, Marcos Douglas B. Santos via Lazarus > [...] > About the string constant concatenation, just use variables when it is proper: > const > V1: string = 'a'; > var > S1: String; > ... later in code ... > S1 := V1 + 'b'; > > String literals can be assigned without problems as long as your > variables are "String". > The big table in the wiki page is intimidating, in reality the issue > is not so complex. I'm already doing that. This not perfect, but is better than have problems. Thanks. Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Wed, Sep 27, 2017 at 5:16 AM, Graeme Geldenhuys via Lazarus wrote: > On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote: >> >> A constant that >> can change... > > > > Yeah, that concept still blows my mind. [figuratively speaking] They should > shoot the developer that came up with that idea - and the team leader that > approved it. Everybody has crazy ideias... the problem is who sign them saying "yeah, go ahead!" :) Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 10:52 PM, Marcos Douglas B. Santos via Lazarus wrote: > So we can say that Lazarus code do not use XPath to work with XML, right? No I cannot say much about the issue. I didn't try it myself. I understood Mattias and Michael V.C. have plans to migrate the XML units to FCL sources. Maybe they can elaborate. > I don't use it. (Windows codepages) Ok, then I misunderstood. :) About the string constant concatenation, just use variables when it is proper: const V1: string = 'a'; var S1: String; ... later in code ... S1 := V1 + 'b'; String literals can be assigned without problems as long as your variables are "String". The big table in the wiki page is intimidating, in reality the issue is not so complex. On Tue, Sep 26, 2017 at 7:29 PM, zeljko wrote: > POS receipt printers :) Ok maybe. I don't have one, difficult to say. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote: A constant that can change... Yeah, that concept still blows my mind. [figuratively speaking] They should shoot the developer that came up with that idea - and the team leader that approved it. Regards, Graeme -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 5:06 PM, Howard Page-Clark via Lazarus wrote: > On 26/09/17 20:51, Marcos Douglas B. Santos via Lazarus wrote: >> >> I understood that I can use like this: >> const >>VALUE: string = 'áéíóú'; >> >> Not like this: >> const >>VALUE = 'áéíóú'; >> >> Right? >> But this is not compile: >> const >>V1: string = 'a'; >>V2: string = V1 + 'b'; > > You can't do that in a const declaration. > But in an implementation, the following does compile: > > {$J+} {$H+} > const > V1: string = 'a'; > V2: string = 'b'; > V3: String = ''; > > begin > V3:=V1 + V2; > WriteLn(V3); > end. I know this trick that was deprecated a long time ago. A constant that can change... I think may be better not using constants in the code anymore. But thanks, anyway. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 26/09/17 20:51, Marcos Douglas B. Santos via Lazarus wrote: I understood that I can use like this: const VALUE: string = 'áéíóú'; Not like this: const VALUE = 'áéíóú'; Right? But this is not compile: const V1: string = 'a'; V2: string = V1 + 'b'; You can't do that in a const declaration. But in an implementation, the following does compile: {$J+} {$H+} const V1: string = 'a'; V2: string = 'b'; V3: String = ''; begin V3:=V1 + V2; WriteLn(V3); end. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 9:09 AM, Juha Manninen via Lazarus wrote: > On Tue, Sep 26, 2017 at 12:11 AM, Marcos Douglas B. Santos via Lazarus > wrote: >> Yeah, but DOM uses DOMString, which is WideString. >> Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is >> UTF8, but I cannot use this unit with XPath unit, which needs a >> TXMLDocument that works with WideString... see my point? > > That is a problem. I guess you can use the units with Lazarus but it > results to many conversions between encodings. > It should be solved somehow. So we can say that Lazarus code do not use XPath to work with XML, right? Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 6:31 AM, Juha Manninen via Lazarus wrote: > On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus > wrote: >> But according with this table, I shouldn't do that because so many >> problems could happen. >> http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8 > > No. It works when assigning to String and that is what matters. I understood that I can use like this: const VALUE: string = 'áéíóú'; Not like this: const VALUE = 'áéíóú'; Right? But this is not compile: const V1: string = 'a'; V2: string = V1 + 'b'; >>> The solution is to NOT use Windows codepages. >>> ... >> So, no problems here and the page is outdated. OK. > > The page is correct but your code and/or data is outdated if it uses > the Windows codepage encoding. :) > Well, honestly, why do you still use it? > Unicode has been around for decades. It solved all the horrible > problems inherent to locale dependent codepages. Windows has supported > full Unicode for ~18 years. > Maybe there still is a valid reason to use codepages but I don't know > what it is. I don't use it. Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 26.09.2017 11:31, Juha Manninen via Lazarus wrote: Maybe there still is a valid reason to use codepages but I don't know what it is. POS receipt printers :) zeljko -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 12:11 AM, Marcos Douglas B. Santos via Lazarus wrote: > Yeah, but DOM uses DOMString, which is WideString. > Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is > UTF8, but I cannot use this unit with XPath unit, which needs a > TXMLDocument that works with WideString... see my point? That is a problem. I guess you can use the units with Lazarus but it results to many conversions between encodings. It should be solved somehow. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus wrote: > But according with this table, I shouldn't do that because so many > problems could happen. > http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8 No. It works when assigning to String and that is what matters. >> The solution is to NOT use Windows codepages. >> ... > So, no problems here and the page is outdated. OK. The page is correct but your code and/or data is outdated if it uses the Windows codepage encoding. :) Well, honestly, why do you still use it? Unicode has been around for decades. It solved all the horrible problems inherent to locale dependent codepages. Windows has supported full Unicode for ~18 years. Maybe there still is a valid reason to use codepages but I don't know what it is. > Like I said, it's a hack. But, again, it was|is a great job. No doubt. Yes. The wiki page lists 3 simple rules: * Normally use type "String" instead of UTF8String or UnicodeString. * Assign a constant always to a type String variable. * Use type UnicodeString explicitly for API calls that need it. For you I would add: * Use Unicode instead of Windows system codepages. With those rules the code is mostly compatible with Delphi. Not bad. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
I do not see how it is a hack, when you have a function taking a null terminated string of a specific character type (in this case PWideChar) and you only have the generic string type you don't know what format the underlying memory of the string is so you cannot pass it as a pointer to the function. In this case you need to convert to the explicit type. These are (potentially) two different types, they just happen to be strings. You could think of it as if you had a function Foo(VP: PSingle) and a variable V: Number where Number could be either Single or Double depending on some macro - but you don't know which one, so to avoid passing a Double you'd need to assign it to a temporary variable to convert it to the right type. There is nothing wrong or hacky with that approach, this is how working with functions that accept pointers work in general - you need to make sure that the pointer you pass in is of the correct type. On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus < lazarus@lists.lazarus-ide.org> wrote: > On Mon, Sep 25, 2017 at 9:52 PM, Juha Manninen via Lazarus > wrote: > > On Tue, Sep 26, 2017 at 3:14 AM, Marcos Douglas B. Santos via Lazarus > > wrote: > >> So, you mean that I cannot declare a constant without specify the > >> type. The language allow me but it won't work? > > > > Yes you can declare a string constant without specifying the type. > > But according with this table, I shouldn't do that because so many > problems could happen. > http://wiki.freepascal.org/Unicode_Support_in_Lazarus# > Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8 > > >> 3.1. "When a parameter type is a pointer PWideChar, > >> you need a temporary UnicodeString variable. > >> ... > >> That is a ugly hack. This code doesn't make any sense, if you don't > >> know about these Unicode issues. > >> We need do remember that trick when we are coding... not good. > > > > It is not so ugly. It is actually an elegant solution. Just one > > assignment, using the FPC's automatic conversion in a clever way. No > > explicit conversion functions or anything. > > The "ugly" pointer typecast is needed always, also in Delphi. > > The "ugly" is because we need to remember to do that instead of just > assign the variable. > IMHO, both design are wrong. But I understand that the problem is in > the compiler — or RTL. > > >> 4. "Reading / writing text file with Windows codepage" > >> ... > >> The text said: "This is not compatible with Delphi ". > >> Examples on that page are hacks. > > > > The solution is to NOT use Windows codepages. They can be seen as a > > historical remain with severe inherent problems which are solved by > > Unicode already a long ago. > > Windows has supported full Unicode since year 2000, and supported > > UCS-2 before that. > > Why would anybody still use the historical Windows codepages? > > So, no problems here and the page is outdated. OK. > > >> Summary: > >> I know that was a huge work for who made that. Lazarus is more > >> Unicode, more compatible with Delphi, and the team could move on. > >> Great. > >> But you might agree with me that this is far from a good design, right? > > > > IMO it is not far from a good design. From FPC's point of view it is a > > hack but you can write 100% Delphi compatible code by following just > > few simple rules (and dumping the historical Windows codepages). > > Like I said, it's a hack. But, again, it was|is a great job. No doubt. > > Best regards, > Marcos Douglas > -- > ___ > Lazarus mailing list > Lazarus@lists.lazarus-ide.org > https://lists.lazarus-ide.org/listinfo/lazarus > -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, Sep 25, 2017 at 9:52 PM, Juha Manninen via Lazarus wrote: > On Tue, Sep 26, 2017 at 3:14 AM, Marcos Douglas B. Santos via Lazarus > wrote: >> So, you mean that I cannot declare a constant without specify the >> type. The language allow me but it won't work? > > Yes you can declare a string constant without specifying the type. But according with this table, I shouldn't do that because so many problems could happen. http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8 >> 3.1. "When a parameter type is a pointer PWideChar, >> you need a temporary UnicodeString variable. >> ... >> That is a ugly hack. This code doesn't make any sense, if you don't >> know about these Unicode issues. >> We need do remember that trick when we are coding... not good. > > It is not so ugly. It is actually an elegant solution. Just one > assignment, using the FPC's automatic conversion in a clever way. No > explicit conversion functions or anything. > The "ugly" pointer typecast is needed always, also in Delphi. The "ugly" is because we need to remember to do that instead of just assign the variable. IMHO, both design are wrong. But I understand that the problem is in the compiler — or RTL. >> 4. "Reading / writing text file with Windows codepage" >> ... >> The text said: "This is not compatible with Delphi ". >> Examples on that page are hacks. > > The solution is to NOT use Windows codepages. They can be seen as a > historical remain with severe inherent problems which are solved by > Unicode already a long ago. > Windows has supported full Unicode since year 2000, and supported > UCS-2 before that. > Why would anybody still use the historical Windows codepages? So, no problems here and the page is outdated. OK. >> Summary: >> I know that was a huge work for who made that. Lazarus is more >> Unicode, more compatible with Delphi, and the team could move on. >> Great. >> But you might agree with me that this is far from a good design, right? > > IMO it is not far from a good design. From FPC's point of view it is a > hack but you can write 100% Delphi compatible code by following just > few simple rules (and dumping the historical Windows codepages). Like I said, it's a hack. But, again, it was|is a great job. No doubt. Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Tue, Sep 26, 2017 at 3:14 AM, Marcos Douglas B. Santos via Lazarus wrote: > So, you mean that I cannot declare a constant without specify the > type. The language allow me but it won't work? Yes you can declare a string constant without specifying the type. > 3.1. "When a parameter type is a pointer PWideChar, > you need a temporary UnicodeString variable. > ... > That is a ugly hack. This code doesn't make any sense, if you don't > know about these Unicode issues. > We need do remember that trick when we are coding... not good. It is not so ugly. It is actually an elegant solution. Just one assignment, using the FPC's automatic conversion in a clever way. No explicit conversion functions or anything. The "ugly" pointer typecast is needed always, also in Delphi. > 4. "Reading / writing text file with Windows codepage" > ... > The text said: "This is not compatible with Delphi ". > Examples on that page are hacks. The solution is to NOT use Windows codepages. They can be seen as a historical remain with severe inherent problems which are solved by Unicode already a long ago. Windows has supported full Unicode since year 2000, and supported UCS-2 before that. Why would anybody still use the historical Windows codepages? > Summary: > I know that was a huge work for who made that. Lazarus is more > Unicode, more compatible with Delphi, and the team could move on. > Great. > But you might agree with me that this is far from a good design, right? IMO it is not far from a good design. From FPC's point of view it is a hack but you can write 100% Delphi compatible code by following just few simple rules (and dumping the historical Windows codepages). Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, Sep 25, 2017 at 7:52 PM, Juha Manninen via Lazarus wrote: > Marcos Douglas, this wiki page answers all your questions about using > Unicode with Lazarus: > http://wiki.freepascal.org/Unicode_Support_in_Lazarus OK, let's talk: 1. "Using UTF-8 in non-LCL programs" "In a non-LCL project add a dependency for LazUtils package. Then add LazUTF8 unit in the uses section of main program file. It must be near the beginning, just after the critical memory managers and threading stuff (e.g. cmem, heaptrc, cthreads)." Indeed, that was very good. Thanks. That solved one of my questions. I tested and it worked perfectly. I would say that should be part of compiler, not in a Lazarus package, because this is a basic thing that should work without other "3rd lib". 2. "Assign a constant always to a type String variable." So, you mean that I cannot declare a constant without specify the type. The language allow me but it won't work? 3. "Calling API functions that use WideString or UnicodeString" "When a parameter type is WideString or UnicodeString, you can just pass a String to it. The compiler converts data automatically. There will be a warning about converting from AnsiString to UnicodeString which can be either ignored or suppressed by typecasting the String to UnicodeString." Then the example: === code begin === procedure ApiCall(aParam: UnicodeString); // Definition. ... ApiCall(S);// Call with String S, ignore warning. ApiCall(UnicodeString(S)); // Call with String S, suppress warning. === code end === All these warnings is so annoying. I understood the point here, but I don't like to see any hint or warning. I need to solve all. But, I am in doubt about what is more annoying: typecasting all arguments or ignore all. 3.1. "When a parameter type is a pointer PWideChar, you need a temporary UnicodeString variable. Assign your String to it. The compiler then converts its data. Then typecast the temporary variable to PWideChar." === code begin === procedure ApiCallP(aParamP: PWideChar); // Definition. ... var Tmp: UnicodeString; // Temporary variable. ... Tmp := S; // Assign String -> UnicodeString. ApiCallP(PWideChar(Tmp)); // Call with temp variable, typecast to pointer. === code end === That is a ugly hack. This code doesn't make any sense, if you don't know about these Unicode issues. We need do remember that trick when we are coding... not good. 4. "Reading / writing text file with Windows codepage" "This is not compatible with Delphi nor with former Lazarus code. In practice you must encapsulate the code dealing with system codepage and convert the data to UTF-8 as quickly as possible." The text said: "This is not compatible with Delphi ". Examples on that page are hacks. 5. "CodePoint functions for encoding agnostic code" I liked to know that exists an unit to work with Code Point which is agnostic if the encoding is UTF8 or UTF16. I will use it. Thanks again. On Mon, Sep 25, 2017 at 8:01 PM, Juha Manninen via Lazarus wrote: > And more ... > > Marcos Douglas, the Unicode solution in Lazarus works amazingly well > when your data is Unicode from the start. > It only has trouble with Windows system codepages but they can be > converted, too. Nowadays, I'm only using Windows so... > Question: what is the fundamental problem? Why can't you use the > system as it is advertised and documented? I've already wrote my issues from the first email. Please, see the first email and then, one of my answer to Mattias about WideString, DOM, etc. Summary: I know that was a huge work for who made that. Lazarus is more Unicode, more compatible with Delphi, and the team could move on. Great. But you might agree with me that this is far from a good design, right? Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
And more ... Marcos Douglas, the Unicode solution in Lazarus works amazingly well when your data is Unicode from the start. It only has trouble with Windows system codepages but they can be converted, too. Question: what is the fundamental problem? Why can't you use the system as it is advertised and documented? Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
Marcos Douglas, this wiki page answers all your questions about using Unicode with Lazarus: http://wiki.freepascal.org/Unicode_Support_in_Lazarus On Mon, Sep 25, 2017 at 9:19 PM, Ondrej Pokorny via Lazarus wrote: > You will have to write your own methods with IFDEF-ed code for things > where it matters (read/write from/to buffer, char-by-char iterations etc.). For iterating codepoints or even "Unicode characters" (*) you don't need IFDEFs. Unit LazUnicode provides helper functions and iterators for it. (*) Unicode character here includes combining codepoints which covers most practical use cases at least with western languages. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, Sep 25, 2017 at 6:23 PM, Sven Barth via Lazarus wrote: > On 25.09.2017 23:11, Marcos Douglas B. Santos via Lazarus wrote: [...] >>> The flags are -MDelphiUnicode, -MDelphi or -MObjFPC. >>> But they only compile the units with sources in the unit path, which >>> excludes all FPC units. Also keep in mind that the system unit and the >>> RTL require a lot of low level functions, which require separate >>> versions. >> >> Which make this flags useless for that. It should be all code (my, >> RTL, Lazarus, etc) to make this work using one type of string. > > No, because especially the RTL and FCL is usually provided precompiled. > Thus you can't change the string type anymore afterwards without > recompiling all the code. That's I am talking about. I use FPC and Lazarus by sources. I compile both. Never used an installer... Maybe you already answered in other way, but: In that case, can I compile FPC and Lazarus with these flags (all strings=UnicodeString) and everything will work like that? >> I can help in a high level way (Classes, Components, etc) not in the >> compiler level. >> But how can I know about these tasks? May I just pick one in bug >> tracker that I want? How to know who is working on each task, which is >> more important? > > Currently noone is working on it. :-O > A first step would be to add modeswitch headers to all units that must > not use a specific mode (e.g. the System, ObjPas and some more language > support units) like this: > > === code begin === > > {$ifdef FPC_UNICODE_RTL} > {$modeswitch unicodestrings} > {$endif} > > === code end === > > Once this is done one can test to compile the RTL, FCL and packages with > FPC_UNICODE_RTL defined and see what blows and fix that step by step... > > Alternatively a constant in the System unit might be better so that one > can check like this: > > === code begin === > > // System unit > {$ifdef FPC_UNICODE_RTL} > FpcRtlIsUnicode = true; > {$else} > FpcRtlIsUnicode = false; > {$endif} > > // some other unit > {$if FpcRtlIsUnicode} > {$modeswitch unicodestrings} > {$endif} > > === code end === I've put {$modeswitch unicodestrings} in two simple programs (CLI and GUI) and... CRASH! Imagine working with this on the compiler level... :) My first thought about is that: Every argument of all classes and functions should be raw string — RawByteString. May have some other types (UTF8String, UTF16String, etc) only for users to use in the high level. For example: If the user know that a file was encoded in UFT8, he/she will use UTF8String only to receive that buffer. Then, every single RTL class/function that works with Strings, should check which encode was used (and we already have this today). These functions will received an "string", will check which encode is (ie, UTF8String following the example), and will pass to another built-in private function to do the job. "UnicodeString", as we know today, shouldn't exists. This does not make any sense. Only RawByteString — which should be only "string" — and others types that defines the encode, as I said above, but used only once to receive the buffer. > Or if one wants to compile with -Municodestrings than instead the core > units need to be protected with > > === code begin === > > {$modeswitch unicodestrings-} > > === code end === > > I'm currently not sure what would be the better approach in the long > term... :/ Just guessing: The default is not Unicode so, shouldn't have logic to use {$modeswitch unicodestrings-}. Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, Sep 25, 2017 at 6:10 PM, Sven Barth via Lazarus wrote: > On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote: >> [...] >> Yes, but using {$modeswitch unicodestrings}, at least in a certain >> unit, should work with the same code between compilers because >> "string", for that unit, is UnicodeString as Delphi string is, no? > > Yes, but it does not change the types of functions, classes, etc. that > are used. They have the types they were compiled with while you are > using a different string type. So you can't simply override a virtual > method for example that has a String argument that is in fact a > AnsiString with a method that has a String that's a UnicodeString as > argument. So of course there will be warnings in case you're passing > UnicodeString variables to AnsiString variables. I saw that many RTL functions have an overload like this: Function FileExists (Const FileName : RawByteString) : Boolean; Function FileExists (Const FileName : UnicodeString) : Boolean; The first one calls the second: Function FileExists (Const FileName : RawByteString) : Boolean; begin Result:=FileExists(UnicodeString(FileName)); end; My question is: No matter the encode of FileName: RawByteString is, if I cast to UnicodeString I will not have any loss of characters? >> Yes, Lazarus do that by default. But did you see in my examples, at >> the first email, how many inconsistencies I got, using just Lazarus >> and change chars in one simple constant? > > Note: I'll ignore the GUI example, cause Ondrej might be better for that. No problem. > For the console you need to keep in mind that the console - at least on > Windows - has a code page as well. On my Linux - which is set to UTF-8 - > your example works without any problem, but if I use Wine I get the same > output as you. Ok, but the compiler knows if a program is a CLI, I believe... so, it could change those variables DefaultSystemCodePage, DefaultFileSystemCodePage... For users (developers) is not clear, do you agree? >>[...] >> I know almost nothing about compilers. But IMHO, the compiler should >> have which it already have: "string", which is an alias. >> Then, for each OS, we should pass one argument like (simplifying): >> -S=UnicodeString or -S=AnsiString... something like that (I hope you >> understood). > > The compiler is not the problem. It's that especially the low level part > of the RTL needs to be aware of the String type and handle it correctly. > Essentially all functions will need to be checked whether they can > correctly handle String (as in the generic string type) or are specific > for AnsiString and thus would need to be adjusted. I see... >> I mean, we should not have overload functions, but only one type of >> string. Even if that type may be RawByteString. > > You are wrong. Think about functions reading or writing data from/to > files. Especially when the data was written with the other String type > in mind. It is normal that external data (files) to have different encodes. IMO, only in these cases, we should care about encoding, because an external data is outside of our code, we cannot control it. >> After compiled, we will have a RTL that will work follow the "-S" argument. >> >>> So the RTL will be adjusted in a way that it can be easily >>> compiled with String = UnicodeString or as is now with String = >>> AnsiString(CP_ACP). But we are not there yet. >> >> Now we're talking. >> Almost everyone that know how to work with "the group of strings", >> making them compatible between FPC and Delphi, are saying that Unicode >> is already done and everything is fine. You are the first one to say >> that is not complete yet. Thank you. I'm glad to know that I'm not >> crazy. > > Unicode itself is working, but in the form of UTF-8, not UTF-16 and as > such it is as compatible to Delphi as it can currently get with some > caveats when the specific type is important. Well, I only setted {mode delphi} and {modeswitch unicodestrings} and I did not leave Lazarus and still got strange results... looks like FPC flags is not compatible with itself or Lazarus. Again, I know that you, Mattias and many others understand that perfectly. But my examples were very simple, but they didn't work perfectly using just FPC and Lazarus. Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 25.09.2017 23:11, Marcos Douglas B. Santos via Lazarus wrote: >>> [...] >>> I know almost nothing about compilers. But IMHO, the compiler should >>> have which it already have: "string", which is an alias. >>> Then, for each OS, we should pass one argument like (simplifying): >>> -S=UnicodeString or -S=AnsiString... something like that (I hope you >>> understood). >> >> The flags are -MDelphiUnicode, -MDelphi or -MObjFPC. >> But they only compile the units with sources in the unit path, which >> excludes all FPC units. Also keep in mind that the system unit and the >> RTL require a lot of low level functions, which require separate >> versions. > > Which make this flags useless for that. It should be all code (my, > RTL, Lazarus, etc) to make this work using one type of string. No, because especially the RTL and FCL is usually provided precompiled. Thus you can't change the string type anymore afterwards without recompiling all the code. >> Unicode <> UnicodeString >> Unicode is working with UTF-8. >> If you want a Delphi compatible UTF-16 RTL and packages you are welcome >> to help the FPC team. > > I can help in a high level way (Classes, Components, etc) not in the > compiler level. > But how can I know about these tasks? May I just pick one in bug > tracker that I want? How to know who is working on each task, which is > more important? Currently noone is working on it. A first step would be to add modeswitch headers to all units that must not use a specific mode (e.g. the System, ObjPas and some more language support units) like this: === code begin === {$ifdef FPC_UNICODE_RTL} {$modeswitch unicodestrings} {$endif} === code end === Once this is done one can test to compile the RTL, FCL and packages with FPC_UNICODE_RTL defined and see what blows and fix that step by step... Alternatively a constant in the System unit might be better so that one can check like this: === code begin === // System unit {$ifdef FPC_UNICODE_RTL} FpcRtlIsUnicode = true; {$else} FpcRtlIsUnicode = false; {$endif} // some other unit {$if FpcRtlIsUnicode} {$modeswitch unicodestrings} {$endif} === code end === Or if one wants to compile with -Municodestrings than instead the core units need to be protected with === code begin === {$modeswitch unicodestrings-} === code end === I'm currently not sure what would be the better approach in the long term... :/ Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
Hi Mattias, On Mon, Sep 25, 2017 at 5:45 PM, Mattias Gaertner via Lazarus wrote: > On Mon, 25 Sep 2017 17:18:05 -0300 > "Marcos Douglas B. Santos via Lazarus" > wrote: > >>[...] > > Your first email does not contain a simple Lazarus+string example. I > see an example for LCL+unicodestring. Yes, because I tried to make the code compatible. If Delphi uses UTF16 there is some logic to use it the same encode... I thought. >>[...] >> I know almost nothing about compilers. But IMHO, the compiler should >> have which it already have: "string", which is an alias. >> Then, for each OS, we should pass one argument like (simplifying): >> -S=UnicodeString or -S=AnsiString... something like that (I hope you >> understood). > > The flags are -MDelphiUnicode, -MDelphi or -MObjFPC. > But they only compile the units with sources in the unit path, which > excludes all FPC units. Also keep in mind that the system unit and the > RTL require a lot of low level functions, which require separate > versions. Which make this flags useless for that. It should be all code (my, RTL, Lazarus, etc) to make this work using one type of string. >> I mean, we should not have overload functions, but only one type of >> string. Even if that type may be RawByteString. > > From a user pov: Yes, that's what Lazarus is recommending: Simply use > one string type, and that is String. The confusion starts when you start > using different string types. Yeah, but DOM uses DOMString, which is WideString. Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is UTF8, but I cannot use this unit with XPath unit, which needs a TXMLDocument that works with WideString... see my point? RTL was only ANSI, now has overload to UnicodeString... but according with Sven, the Unicode support is not finished yet. And what about the huge Warnings between these units, do you think that is normal to use cast on everything? > Unicode <> UnicodeString > Unicode is working with UTF-8. > If you want a Delphi compatible UTF-16 RTL and packages you are welcome > to help the FPC team. I can help in a high level way (Classes, Components, etc) not in the compiler level. But how can I know about these tasks? May I just pick one in bug tracker that I want? How to know who is working on each task, which is more important? Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote: > Hi Sven, > First of all, thanks for your time to answer me. > > On Mon, Sep 25, 2017 at 4:43 PM, Sven Barth via Lazarus > wrote: >> On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote: >>> I understand use IFDEF to compile in different platforms like Windows >>> vs... err... Haiku. Of Linux vs Nintendo Wii... >>> But why should I use IFDEF in a code that should be the same in both >>> compilers (FPC vs Delphi)? >> >> Because they *aren't* the same. In Delphi String = UnicodeString while >> in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a >> different modeswitch *does not* change that, cause modes are unit specific. > > Yes, but using {$modeswitch unicodestrings}, at least in a certain > unit, should work with the same code between compilers because > "string", for that unit, is UnicodeString as Delphi string is, no? Yes, but it does not change the types of functions, classes, etc. that are used. They have the types they were compiled with while you are using a different string type. So you can't simply override a virtual method for example that has a String argument that is in fact a AnsiString with a method that has a String that's a UnicodeString as argument. So of course there will be warnings in case you're passing UnicodeString variables to AnsiString variables. >> Especially the RTL is not ready for String = UnicodeString. So your best >> bet is to use UTF8String or set the default code page to UTF8 (the LCL >> units do that by default if I remember correctly, but Ondrej can confirm >> or deny that). > > Yes, Lazarus do that by default. But did you see in my examples, at > the first email, how many inconsistencies I got, using just Lazarus > and change chars in one simple constant? Note: I'll ignore the GUI example, cause Ondrej might be better for that. For the console you need to keep in mind that the console - at least on Windows - has a code page as well. On my Linux - which is set to UTF-8 - your example works without any problem, but if I use Wine I get the same output as you. >>> It will be slower than now? Yes, maybe... but we already use objects! >>> If you want 500% performance, use pointers, records and procedures >>> with whatever encode you want. But if you use objects, the overhead >>> already exists... and who cares? 1ms... 2ms... even 2s that you may >>> lost using UTF16? (or UTF8, but make all equal!) So? The world is >>> using Ruby and they don't care... or Python, Java... and they store in >>> UTF16 too, which requires a double of space... but if it works and the >>> code is clean, should be more important, don't agree? >> >> For FPC also more restricted targets are to be kept in mind (AVR, DOS, >> etc.). > > I know almost nothing about compilers. But IMHO, the compiler should > have which it already have: "string", which is an alias. > Then, for each OS, we should pass one argument like (simplifying): > -S=UnicodeString or -S=AnsiString... something like that (I hope you > understood). The compiler is not the problem. It's that especially the low level part of the RTL needs to be aware of the String type and handle it correctly. Essentially all functions will need to be checked whether they can correctly handle String (as in the generic string type) or are specific for AnsiString and thus would need to be adjusted. > I mean, we should not have overload functions, but only one type of > string. Even if that type may be RawByteString. You are wrong. Think about functions reading or writing data from/to files. Especially when the data was written with the other String type in mind. > > After compiled, we will have a RTL that will work follow the "-S" argument. > >> So the RTL will be adjusted in a way that it can be easily >> compiled with String = UnicodeString or as is now with String = >> AnsiString(CP_ACP). But we are not there yet. > > Now we're talking. > Almost everyone that know how to work with "the group of strings", > making them compatible between FPC and Delphi, are saying that Unicode > is already done and everything is fine. You are the first one to say > that is not complete yet. Thank you. I'm glad to know that I'm not > crazy. Unicode itself is working, but in the form of UTF-8, not UTF-16 and as such it is as compatible to Delphi as it can currently get with some caveats when the specific type is important. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, 25 Sep 2017 17:18:05 -0300 "Marcos Douglas B. Santos via Lazarus" wrote: >[...] > Yes, but using {$modeswitch unicodestrings}, at least in a certain > unit, should work with the same code between compilers because > "string", for that unit, is UnicodeString as Delphi string is, no? The important thing is "in a certain unit". As soon as you access strings from other units, you have to consider their type. > > Especially the RTL is not ready for String = UnicodeString. So your best > > bet is to use UTF8String or set the default code page to UTF8 (the LCL > > units do that by default if I remember correctly, but Ondrej can confirm > > or deny that). Unit LazUtf8 does it. > Yes, Lazarus do that by default. But did you see in my examples, at > the first email, how many inconsistencies I got, using just Lazarus > and change chars in one simple constant? Your first email does not contain a simple Lazarus+string example. I see an example for LCL+unicodestring. >[...] > I know almost nothing about compilers. But IMHO, the compiler should > have which it already have: "string", which is an alias. > Then, for each OS, we should pass one argument like (simplifying): > -S=UnicodeString or -S=AnsiString... something like that (I hope you > understood). The flags are -MDelphiUnicode, -MDelphi or -MObjFPC. But they only compile the units with sources in the unit path, which excludes all FPC units. Also keep in mind that the system unit and the RTL require a lot of low level functions, which require separate versions. > I mean, we should not have overload functions, but only one type of > string. Even if that type may be RawByteString. From a user pov: Yes, that's what Lazarus is recommending: Simply use one string type, and that is String. The confusion starts when you start using different string types. > After compiled, we will have a RTL that will work follow the "-S" argument. The RTL has already a lot of IFDEFs for the coming UnicodeString RTL. > > So the RTL will be adjusted in a way that it can be easily > > compiled with String = UnicodeString or as is now with String = > > AnsiString(CP_ACP). But we are not there yet. > > Now we're talking. > Almost everyone that know how to work with "the group of strings", > making them compatible between FPC and Delphi, are saying that Unicode > is already done and everything is fine. You are the first one to say > that is not complete yet. Thank you. I'm glad to know that I'm not > crazy. Unicode <> UnicodeString Unicode is working with UTF-8. If you want a Delphi compatible UTF-16 RTL and packages you are welcome to help the FPC team. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
Hi Sven, First of all, thanks for your time to answer me. On Mon, Sep 25, 2017 at 4:43 PM, Sven Barth via Lazarus wrote: > On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote: >> I understand use IFDEF to compile in different platforms like Windows >> vs... err... Haiku. Of Linux vs Nintendo Wii... >> But why should I use IFDEF in a code that should be the same in both >> compilers (FPC vs Delphi)? > > Because they *aren't* the same. In Delphi String = UnicodeString while > in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a > different modeswitch *does not* change that, cause modes are unit specific. Yes, but using {$modeswitch unicodestrings}, at least in a certain unit, should work with the same code between compilers because "string", for that unit, is UnicodeString as Delphi string is, no? > Especially the RTL is not ready for String = UnicodeString. So your best > bet is to use UTF8String or set the default code page to UTF8 (the LCL > units do that by default if I remember correctly, but Ondrej can confirm > or deny that). Yes, Lazarus do that by default. But did you see in my examples, at the first email, how many inconsistencies I got, using just Lazarus and change chars in one simple constant? >> It will be slower than now? Yes, maybe... but we already use objects! >> If you want 500% performance, use pointers, records and procedures >> with whatever encode you want. But if you use objects, the overhead >> already exists... and who cares? 1ms... 2ms... even 2s that you may >> lost using UTF16? (or UTF8, but make all equal!) So? The world is >> using Ruby and they don't care... or Python, Java... and they store in >> UTF16 too, which requires a double of space... but if it works and the >> code is clean, should be more important, don't agree? > > For FPC also more restricted targets are to be kept in mind (AVR, DOS, > etc.). I know almost nothing about compilers. But IMHO, the compiler should have which it already have: "string", which is an alias. Then, for each OS, we should pass one argument like (simplifying): -S=UnicodeString or -S=AnsiString... something like that (I hope you understood). I mean, we should not have overload functions, but only one type of string. Even if that type may be RawByteString. After compiled, we will have a RTL that will work follow the "-S" argument. > So the RTL will be adjusted in a way that it can be easily > compiled with String = UnicodeString or as is now with String = > AnsiString(CP_ACP). But we are not there yet. Now we're talking. Almost everyone that know how to work with "the group of strings", making them compatible between FPC and Delphi, are saying that Unicode is already done and everything is fine. You are the first one to say that is not complete yet. Thank you. I'm glad to know that I'm not crazy. Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote: > I understand use IFDEF to compile in different platforms like Windows > vs... err... Haiku. Of Linux vs Nintendo Wii... > But why should I use IFDEF in a code that should be the same in both > compilers (FPC vs Delphi)? Because they *aren't* the same. In Delphi String = UnicodeString while in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a different modeswitch *does not* change that, cause modes are unit specific. > Is it because the string type is not Unicode? OK, so I want to convert > all to use UTF16, ie, UnicodeString (wrong name) and make ALL code > compatible. But this is looks like not possible without: > > * IFDEFs > * know a few {modes} > * know what type of string I'm working on > > > If there is an argument in the compiler to compile it with the > definition of "all string is an UnicodeString like Java, C#, Delphi > and all them", would be great. > Then we will compile the compiler and Lazarus with the same type of > string and everything will work. Especially the RTL is not ready for String = UnicodeString. So your best bet is to use UTF8String or set the default code page to UTF8 (the LCL units do that by default if I remember correctly, but Ondrej can confirm or deny that). > It will be slower than now? Yes, maybe... but we already use objects! > If you want 500% performance, use pointers, records and procedures > with whatever encode you want. But if you use objects, the overhead > already exists... and who cares? 1ms... 2ms... even 2s that you may > lost using UTF16? (or UTF8, but make all equal!) So? The world is > using Ruby and they don't care... or Python, Java... and they store in > UTF16 too, which requires a double of space... but if it works and the > code is clean, should be more important, don't agree? For FPC also more restricted targets are to be kept in mind (AVR, DOS, etc.). So the RTL will be adjusted in a way that it can be easily compiled with String = UnicodeString or as is now with String = AnsiString(CP_ACP). But we are not there yet. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
I understand use IFDEF to compile in different platforms like Windows vs... err... Haiku. Of Linux vs Nintendo Wii... But why should I use IFDEF in a code that should be the same in both compilers (FPC vs Delphi)? Is it because the string type is not Unicode? OK, so I want to convert all to use UTF16, ie, UnicodeString (wrong name) and make ALL code compatible. But this is looks like not possible without: * IFDEFs * know a few {modes} * know what type of string I'm working on If there is an argument in the compiler to compile it with the definition of "all string is an UnicodeString like Java, C#, Delphi and all them", would be great. Then we will compile the compiler and Lazarus with the same type of string and everything will work. It will be slower than now? Yes, maybe... but we already use objects! If you want 500% performance, use pointers, records and procedures with whatever encode you want. But if you use objects, the overhead already exists... and who cares? 1ms... 2ms... even 2s that you may lost using UTF16? (or UTF8, but make all equal!) So? The world is using Ruby and they don't care... or Python, Java... and they store in UTF16 too, which requires a double of space... but if it works and the code is clean, should be more important, don't agree? Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On Mon, Sep 25, 2017 at 3:19 PM, Ondrej Pokorny via Lazarus wrote: > On 25.09.2017 20:02, Marcos Douglas B. Santos via Lazarus wrote: >> >> May I code using just "string"? > > > Yes. LCL is ANSI/UTF8 only, so is TStrings. > > You can write Lazarus+Delphi compatible code without a lot of problems. Just > use the string type. The only thing you have to be aware is that in Delphi > you work with UTF-16 and in Lazarus with UTF-8 - but for most cases it > doesn't really matter. You will have to write your own methods with IFDEF-ed > code for things where it matters (read/write from/to buffer, char-by-char > iterations etc.). But my code had different outputs and/or warnings only using Lazarus! You said compatible. What about Warnings? Why I need IFDEF-ed if the code "is" compatible? For example, is this code compatible/work with/on Delphi? https://github.com/mdbs99/james/blob/a9ad48fb8eaf4f11c6dd7b65d6ac2f63e6fc09fb/test/james.data.tests.pas#L57 Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Converting all code to use UnicodeString
On 25.09.2017 20:02, Marcos Douglas B. Santos via Lazarus wrote: May I code using just "string"? Yes. LCL is ANSI/UTF8 only, so is TStrings. You can write Lazarus+Delphi compatible code without a lot of problems. Just use the string type. The only thing you have to be aware is that in Delphi you work with UTF-16 and in Lazarus with UTF-8 - but for most cases it doesn't really matter. You will have to write your own methods with IFDEF-ed code for things where it matters (read/write from/to buffer, char-by-char iterations etc.). Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus