Re: [Lazarus] String vs WideString
El 17/08/17 a les 01:34, Graeme Geldenhuys via Lazarus ha escrit: On 2017-08-16 19:26, Luca Olivetti via Lazarus wrote: I mean, TBytes is just an "array of char". NO! Char can now mean a 1-byte char or a 2-byte char (I don't know how Sorry, I meant "array of byte". The point is it doesn't have all the features of a string. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 17.08.2017 16:34, Graeme Geldenhuys via Lazarus wrote: On 2017-08-17 13:40, Marcos Douglas B. Santos via Lazarus wrote: Sorry, but every single warning is a... warning... that needs to be resolved. I feel exactly the same. :-) It took me ages to figure out how to change my code so I could get rid of the "variable not initialized" whenever you used FillChar(). And what do you use? The Default() intrinsic function? Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-17 13:40, Marcos Douglas B. Santos via Lazarus wrote: Sorry, but every single warning is a... warning... that needs to be resolved. I feel exactly the same. :-) It took me ages to figure out how to change my code so I could get rid of the "variable not initialized" whenever you used FillChar(). Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
Am 17.08.2017 12:17 schrieb "Bart via Lazarus" < lazarus@lists.lazarus-ide.org>: > > On 8/17/17, Sven Barth via Lazaruswrote: > > >> really? delphi came from TP/BP... i was (still am, actually) using > > dynamic arrays in TP6 ;) > > > > Dynamic arrays in the form of "array of Type" were only introduced in > > Delphi 3 if I remember correctly. Anything before that needed manual memory > > management. > > I had D3 Pro, and this did definitively NOT support dynamic arrays. > (Even String still was ShortString.) > All arrays had to be fixed range. > The often used construct to bypass this limitation was: Array[0..0] of > TSomeType and have Range checking of. Then it was Delphi 4 ^^' Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
Am 17.08.2017 14:32 schrieb "Michael Schnell via Lazarus" < lazarus@lists.lazarus-ide.org>: > > On 17.08.2017 12:09, Bart via Lazarus wrote: >> >> >> Variables of the ordinal type Char are used to store ASCII characters." >> >> > According to this wording, using Windows with ANSI character set would be a no-go. Bart quoted from the TP help. And TP was written for DOS. There wasn't any Unicode or ANSI around yet... Regards Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Wed, Aug 16, 2017 at 12:38 PM, Juha Manninen via Lazaruswrote: > On Wed, Aug 16, 2017 at 5:48 PM, Marcos Douglas B. Santos via Lazarus >> Are you saying that I need to do this? >> (following the firt example on this thread) > > No, if the parameter is WideString, not a pointer PWideChar, you can > just call it like you did. Suppress the warning as Mattias told if it > bothers you. You can also make a helper function so the conversion > happens in one place. > Yes, for OLE you need WideString. "Suppress the warning as Mattias told if it bothers you" Of course bothers me. Sorry, but every single warning is a... warning... that needs to be resolved. If this is not a problem (or a possible future problem), the compiler should not give us a warning, right? Best regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 17.08.2017 12:41, Tony Whyman via Lazarus wrote: Finally: "In UTF-16, code points greater or equal to 2^16 are encoded using /two/ 16-bit code units. 2¹⁵ ??? -Michael-- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 17.08.2017 12:41, Tony Whyman via Lazarus wrote: UCS-2 differs from UTF-16 by being a constant length encoding and only capable of encoding characters of BMP, it is supported by many programs." Rather obviously Embarcadero primarily had UCS-2 in mind as they created the "Unicode aware" Delphi. While it in fact does support full Unicode, keeping MyChar:=MyString[i] in place suggests to presume UCS-2 coded text for "unaware" programmers. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 17.08.2017 12:09, Bart via Lazarus wrote: Variables of the ordinal type Char are used to store ASCII characters." According to this wording, using Windows with ANSI character set would be a no-go. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16/08/17 11:05, Juha Manninen via Lazarus wrote: 2. Clean up the char type. ... Why shouldn't there be a single char type that intuitively represents a single character regardless of how many bytes are used to represent it. What do you mean by "a single character"? A "character" in Unicode can mean about 7 different things. Which one is your pick? This question is for everybody in this thread who used the word "character". Are you making my points for me? If such a basic term as "character" means 7 different things then something is badly amiss. It should be fairly obvious that in this context, character = printable symbol - whilst for practical reasons allowing for format control characters such as a "end of line" and "end of string". I believe that you need to go back to the idea that you have both an abstract representation of a character with a constant semantic, separate from the actual encoding and for which there may be many different and valid encodings. For example, using a somewhat dated comparison, a lower case latin alphabet letter 'a' should always have a constant semantic, but in ASCII is encoded as decimal 97, while in EBCDIC is encoded as decimal 129. Even though they have different binary values, the represent the same abstract character. I want a 'char' type in Pascal to represent a character such as a lower case 'a' regardless of the encoding used. Indeed, for a program to be properly portable, the programmer should not have to care are the actual encoding - only that it is a lower case 'a'. Hence my proposal that a character type should include an implicit or explicit attribute that records the encoding scheme used - which could vary from ASCII to UTF-32. You can then go on to define a text string as an array of characters with the same encoding scheme. Yes, in a world where we have to live with UTF8, UTF16, UTF32, legacy code pages and Chinese variations on UTF8, that means that dynamic attributes have to be included in the type. But isn't that the only way to have consistent and intuitive character handling? What do you mean? Chinese don't have a variation of UTF8. UTF8 is global unambiguous encoding standard, part of Unicode. I was referring to GB 18030 and that it has one, two and four byte code points. The fundamental problem is that you want to hide the complexity of Unicode by some magic String type of a compiler. It is not possible. Unicode remains complex but the complexity is NOT in encodings! No, a codepoint's encoding is the easy part. For example I was easily able to create a unit to support encoding agnostic code. See unit LazUnicode in package LazUtils. The complexity is elsewhere: - "Character" composed of codepoints in precomposed and decomposed (normalized) forms. - Compare and sort text based on locale. - Uppercase / Lowercase rules based on locale. - Glyphs - Graphemes - etc. I must admit I don't understand well those complex parts. I do understand codeunits and codepoints, and I understand they are the easy part. Juha The point I believe that you are missing is to consider that a character is an abstract symbol with a semantic independent of how it is encoded. Collation sequences are independent of encoding and should remain the same regardless of how a character set is encoded. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16/08/17 11:05, Juha Manninen via Lazarus wrote: On Mon, Aug 14, 2017 at 4:21 PM, Tony Whyman via Lazaruswrote: UTF-16/Unicode can only store 65,536 characters while the Unicode standard (that covers UTF8 as well) defines 136,755 characters. UTF-16/Unicode's main advantage seems to be for rapid indexing of large strings. That shows complete ignorance from your side about Unicode. You consider UTF-16 as a fixed-width encoding. :( Unfortunately many other programmers had the same wrong idea or they were just lazy. The result anyway is a lot of broken UTF-16 code out there. You do like to use the word "ignorance" don't you. You can if you want take the view that all the "other programmers" that got the wrong idea are "stupid monkeys that don't know any better" or, alternatively, that they just wanted a nice cup of tea rather than the not quite tea drink that was served up. Wikipedia sums the problem up nicely: "The early 2-byte encoding was usually called "Unicode", but is now called "UCS-2". UCS-2 differs from UTF-16 by being a constant length encoding and only capable of encoding characters of BMP, it is supported by many programs." This is where the problem starts. The definitive of "Unicode" was changed (foolishly in my opinion) after it had been accepted by the community and the result is confusion. Hence my first point about not even using it. In using "UTF16/Unicode" I was attempting to convey the common use of the term which is to see UTF-16 as what is now defined as UCS-2. This is because hardly anyone I know uses UCS-2 and instead says "Unicode". Perhaps I just spend too much time amongst the ignorant. Wikipedia also makes the wonderful point that "The UTF-16 encoding scheme was developed as a compromise to resolve this impasse in version 2.0". The impasse having resulted from "4 bytes per character wasted a lot of disk space and memory, and because some manufacturers were already heavily invested in 2-byte-per-character technology". Finally: "In UTF-16, code points greater or equal to 2^16 are encoded using /two/ 16-bit code units. The standards organizations chose the largest block available of un-allocated 16-bit code points to use as these code units (since most existing UCS-2 data did not use these code points and would be valid UTF-16). Unlike UTF-8 they did not provide a means to encode these code points". Which is from where I get my own view that UTF-16, as defined by the standards, is pointless. If you keep it to a UCS-2 (like) subset then you can get rapid indexing of character arrays. But as soon as you introduce the possibility of some characters being encoded as two 16-bit units then you lose rapid indexing and I can see no advantage over UTF-8 - plus you get all the fun of worrying about byte order. Indeed, I believe those lazy programmers that you referred to, are actually making a conscious decision to prefer to work with a 16-bit code point only UTF-16 subset (i.e. the Basic Multilingual Plan) precisely so that they can do rapid indexing. As soon as you bring in 2 x 16-bit code unit code points, you lose that benefit - and perhaps you should be using UTF-32. IMHO, Linux has got it right by using UTF-8 as the standard for character encoding and one of Lazarus's USPs is that it follows that lead - even for Windows. I can see why a program that does intensive text scanning will use a UTF-16 constrained to the BMP (i.e. 16-bit only), but not why anyone would prefer an unconstrained UTF-16 over UTF-8. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 8/17/17, Sven Barth via Lazaruswrote: >> really? delphi came from TP/BP... i was (still am, actually) using > dynamic arrays in TP6 ;) > > Dynamic arrays in the form of "array of Type" were only introduced in > Delphi 3 if I remember correctly. Anything before that needed manual memory > management. I had D3 Pro, and this did definitively NOT support dynamic arrays. (Even String still was ShortString.) All arrays had to be fixed range. The often used construct to bypass this limitation was: Array[0..0] of TSomeType and have Range checking of. Bart -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 8/17/17, Luca Olivetti via Lazaruswrote: > I started using strings as communication buffers since delphi 2. There > weren't even dynamic arrays then... From the Turbo Pascal Help: "A string type variable is a sequence of characters ..." And then when you click on "characters": "Char type --- Variables of the ordinal type Char are used to store ASCII characters." None of this suggests that string is a good type for storing arbitrary byte sequences. You misused an implementation detail of the type (Ansi)String. And now you blame fpc. You should have used a sane type for your buffer from the start. Bart -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
Am 17.08.2017 11:11 schrieb "Michael Schnell via Lazarus" < lazarus@lists.lazarus-ide.org>: > > Maybe, Sven could answer to this mail in the other thread... > I provided an example in my answer to Tony Whyman in the same subbranch of the thread. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
Am 17.08.2017 11:21 schrieb "Michael Schnell via Lazarus" < lazarus@lists.lazarus-ide.org>: > > On 16.08.2017 22:40, Sven Barth via Lazarus wrote: >> >> Trunk supports Insert() and Delete() on dynamic arrays, Concat() and + are on the near term ToDo list. > > > Supposedly "pos", as well. But that does not really help if we don't have a TStringList workalike, and supposedly several more library functions. > > That is why I feel empowering the string paradigm for such use would be more appropriate. (See the thread "dynamic string proposal"). Why do you want to stuff everything and the kitchen sink into TStrings? There are much more suitable and less specialized container types available for this (to name a few: TFPGList, TList<>, etc.). Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 22:40, Sven Barth via Lazarus wrote: Trunk supports Insert() and Delete() on dynamic arrays, Concat() and + are on the near term ToDo list. Supposedly "pos", as well. But that does not really help if we don't have a TStringList workalike, and supposedly several more library functions. That is why I feel empowering the string paradigm for such use would be more appropriate. (See the thread "dynamic string proposal"). -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 16.08.2017 20:26, Luca Olivetti via Lazarus wrote: Call me lazy but I don't want to reinvent the wheel and re-implement from scratch the functionality that a plain ansistring provides and TBytes to this day doesn't. So please continue in the thread "dynamic string proposal". Exactly this is part of what is discussed there. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] dynamic string proposal
Maybe, Sven could answer to this mail in the other thread... On 14.08.2017 18:47, Sven Barth via Lazarus wrote: The main problem of such a dynamic type would be the inability to do fast indexing as the compiler would need to insert runtime checks for the size of a character. What "indexing" do you think of ? Could you give an example where such a difference is supposed to get important ? (As you know I wrote a paper where I claimed the contrary. I'd like to revise same if necessary.) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus