Re: [fpc-devel] Unicodestring branch, please test and help fixing #2
After reading the mailing lists more I've played a little bit with this and it seems that with Zeos (MySQL5) it is working out of the box at first sight (need some more tests) if one sets the ZConnection.Properties to character_set_client=utf8 character_set_connection=utf8 character_set_database=utf8 character_set_results=utf8 character_set_server=utf8 character_set_system=utf8 collation_connection=utf8_general_ci collation_database=utf8_general_ci collation_server=utf8_general_ci Codepage=utf8 (suggested by Ivan Gan on the Lazarus mailing list back in July) No matter if the database/table is UTF8 encoded or not, the returned values are OK for the Lazarus controls as well as to the FieldByName...AsString string assignments. For me it seems to be working both ways (reading from the tables and writing to the tables with SQL statements). If I'm using the fcl-db (SQLdb Lazarus package) no matter what I tried it didn't work out of the box. Unless I used the ConvertEncoding(SomeStringFromTheTable, 'cp1252', 'utf8') function I could not get any string value from the table fields. Did not matter if the database/table was UTF8 encoded or not. I'm sure it can be made to work somehow. AB Joost van der Sluis wrote: Op vrijdag 12-09-2008 om 15:56 uur [tijdzone +0200], schreef Mattias Gärtner: Zitat von Joost van der Sluis <[EMAIL PROTECTED]>: Op vrijdag 12-09-2008 om 13:22 uur [tijdzone +0200], schreef JoshyFun: A> Thanks for pointing me to the Lazarus thread about this and the bug A> report. Checked them. A> But as I understand there is no solution available at the moment for this. I had partially solved the problem using the handler "OnGetText" ? (I'm not sure about the name) for each field which is somehow dirty forcing a codepage to UTF8 conversion (in Lazarus you will find some codepage<->UTF conversions available). I think that the original poster didn't looked very well in the archives, this solution is told here quite often. A> I have a database that is not encoded utf8 (and it will never be because A> other client programs are accessing it and their users do not want/need A> to be converted to unicode). How do I get the field values into A> FPC/Lazarus into a string variable? Right now the non-unicode strings A> are returned as empty from a database field due to FCL conversion functions. If you will need this as a fixed solution for this project maybe you can think in create a new database unit file based in the current one (change the name of course) with hardcoded UTF8 encoding from codepage for each string once retrieved from the database. Take care about string length as UTF8 ones will be equal or longer than the original ones. You can just override one single method to do this. This is also told a few times on this list. Maybe it is not documented at the right place? It is not documented at all. Just like the rest of the database-stuff. But maybe I should write a FAQ for fpc. With the new lazarus-versions using UTF-8 by default, this is asked quite often. Joost ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Florian Klaempfl wrote: No. First, imo it's more likely that the next release will be 2.4.0 and not 2.2.4, further, the changes are too big. Do you have a todo for it? IOW, what is missing to start releasing of 2.4.0 (we need resources changes ;) ) Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Sunday 14 September 2008 19.22:13 Florian Klaempfl wrote: Martin Schreiber schrieb: I tried with trunk, same result. The problem is probably that the second constant string parameter has a wrong reference count. It is initially 0 instead of -1. The incref call at begin of winfilepath turns it to 1, decref in finalize section of winfilepath tries to free the constant string memory -> bumm. Fixed in rev 11779. Thanks for the test. Win32 MSEide works now with UnicodeString, no problems found up to now. :-) Thanks a lot! Do you plan to merge to fixes_2_2? No. First, imo it's more likely that the next release will be 2.4.0 and not 2.2.4, further, the changes are too big. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Sunday 14 September 2008 19.22:13 Florian Klaempfl wrote: > Martin Schreiber schrieb: > > I tried with trunk, same result. The problem is probably that the second > > constant string parameter has a wrong reference count. It is initially 0 > > instead of -1. The incref call at begin of winfilepath turns it to 1, > > decref in finalize section of winfilepath tries to free the constant > > string memory -> bumm. > > Fixed in rev 11779. Thanks for the test. Win32 MSEide works now with UnicodeString, no problems found up to now. :-) Thanks a lot! Do you plan to merge to fixes_2_2? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Thursday 11 September 2008 23.18:07 Florian Klaempfl wrote: Martin Schreiber schrieb: On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: I have a crash in MSEide startup in a procedure finalization section: [...] I saw that you merged unicodestring to trunk. Should I test with trunk instead of unicodestring branch? Yes. Unicodestring branch is closed. I tried with trunk, same result. The problem is probably that the second constant string parameter has a wrong reference count. It is initially 0 instead of -1. The incref call at begin of winfilepath turns it to 1, decref in finalize section of winfilepath tries to free the constant string memory -> bumm. Fixed in rev 11779. Thanks for the test. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Thursday 11 September 2008 23.18:07 Florian Klaempfl wrote: > Martin Schreiber schrieb: > > On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: > > > > I have a crash in MSEide startup in a procedure finalization section: [...] > > I saw that you merged unicodestring to trunk. Should I test with trunk > > instead of unicodestring branch? > > Yes. Unicodestring branch is closed. I tried with trunk, same result. The problem is probably that the second constant string parameter has a wrong reference count. It is initially 0 instead of -1. The incref call at begin of winfilepath turns it to 1, decref in finalize section of winfilepath tries to free the constant string memory -> bumm. Testresult: " -1 1 An unhandled exception occurred at $77892373 : EAccessViolation : Access violation $77892373 $778922F8 $0040A214 $004097A3 $004098DD $004099DB $00408844 $004068EA $0040696D $00401858 $004018D5 " The crash stack: " #0 77892373 :0 ??() #1 00416E04 :0 U_SYSTEM_ENTRYINFORMATION() #2 004196D4 :0 U_SYSINITPAS_ENTRYINFORMATION() #3 004162C4 :0 U_SYSTEM_OUTPUT() #4 014CFE20 :0 ??() #5 00408799 sysheap.inc:38 SYSOSALLOC(SIZE=0) #6 778922F8 sysheap.inc:0 ??() #7 00416F00 sysheap.inc:0 U_SYSTEM_ORPHANED_FREELISTS() #8 0040A762 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void) #9 0040A214 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 944, SPINCOUNT = 0}) #10 004097A3 heap.inc:1034 WAITFREE_VAR(PMCV=0x41312c) #11 004098DD heap.inc:1086 SYSFREEMEM_VAR(LOC_FREELISTS=0x416f84, PMCV=0x41312c) #12 004099DB heap.inc:1125 SYSFREEMEM(P=0x413138) #13 00408844 heap.inc:275 FREEMEM(P=0x413138) #14 004068EA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x413138) #15 0040696D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x413138) #16 00401858 decrefcrash.pas:63 WINFILEPATH(DIRNAME=0x0, FILENAME=0x413138, result=0x14fdab0) #17 004018D5 decrefcrash.pas:69 main() " And there are calls to fpc_WideStr_Decr_Ref I don't understand. Test program attached. Martin program decrefcrash; {$ifdef FPC}{$mode objfpc}{$h+}{$endif} {$ifdef mswindows}{$apptype console}{$endif} uses {$ifdef FPC}{$ifdef linux}cthreads,{$endif}{$endif} sysutils; const maxdatasize = $7fff; type msechar = unicodechar; msestring = unicodestring; msecharaty = array[0..maxdatasize div sizeof(msechar)-1] of msechar; pmsecharaty = ^msecharaty; procedure replacechar1(var dest: msestring; a,b: msechar); //replaces a by b var int1: integer; begin uniquestring(dest); for int1:= 0 to length(dest)-1 do begin if pmsecharaty(dest)^[int1] = a then begin pmsecharaty(dest)^[int1]:= b; end; end; end; function winfilepath(dirname,filename: msestring): msestring; begin writeln((pptrint(pointer(dirname))-2)^); flush(output); writeln((pptrint(pointer(filename))-2)^); flush(output); replacechar1(dirname,msechar('/'),msechar('\')); replacechar1(filename,msechar('/'),msechar('\')); if (length(dirname) >= 3) and (dirname[1] = '\') and (dirname[3] = ':') then begin dirname[1]:= dirname[2]; // '/c:' -> 'c:\' dirname[2]:= ':'; dirname[3]:= '\'; if (dirname[4] = '\') and (length(dirname) > 4) then begin move(dirname[5],dirname[4],(length(dirname) - 4)*sizeof(msechar)); setlength(dirname,length(dirname) - 1); end; end; if filename <> '' then begin if dirname = '' then begin result:= '.\'+filename; end else begin if dirname[length(dirname)] <> '\' then begin result:= dirname + '\' + filename; end else begin result:= dirname + filename; end; end; end else begin result:= dirname; end; end; var mstr1,mstr2: msestring; begin mstr2:= 'C:\Dokumente und Einstellungen\mseca\Anwendungsdaten\.mseide'; mstr1:= winfilepath(mstr2,'*'); end. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
> It is not documented at all. Just like the rest of the database-stuff. > But maybe I should write a FAQ for fpc. With the new lazarus-versions > using UTF-8 by default, this is asked quite often. This would be really nice. I know I'm not the only one who doesn't want to spend days on hacking and debugging the components and FCL code to find out why the database field values disappear/morf before reaching my program code when they didn't do it before. People will start using these new unicode based development tools and this problem will be there for all of them (and the problem is not only with the DBAware components but using a simple FieldByNameAsString and putting it into a normal control too). A transparent solution would be the best - like FCL to do conversions back and forth automatically from the database codepage when asked to - but I guess that is too much to ask for. :) Maybe not even possible. Thank you for the help guys. Ill try to dig up more info from the mailing list archives when I have time. Joost van der Sluis wrote: Op vrijdag 12-09-2008 om 15:56 uur [tijdzone +0200], schreef Mattias Gärtner: Zitat von Joost van der Sluis <[EMAIL PROTECTED]>: Op vrijdag 12-09-2008 om 13:22 uur [tijdzone +0200], schreef JoshyFun: A> Thanks for pointing me to the Lazarus thread about this and the bug A> report. Checked them. A> But as I understand there is no solution available at the moment for this. I had partially solved the problem using the handler "OnGetText" ? (I'm not sure about the name) for each field which is somehow dirty forcing a codepage to UTF8 conversion (in Lazarus you will find some codepage<->UTF conversions available). I think that the original poster didn't looked very well in the archives, this solution is told here quite often. A> I have a database that is not encoded utf8 (and it will never be because A> other client programs are accessing it and their users do not want/need A> to be converted to unicode). How do I get the field values into A> FPC/Lazarus into a string variable? Right now the non-unicode strings A> are returned as empty from a database field due to FCL conversion functions. If you will need this as a fixed solution for this project maybe you can think in create a new database unit file based in the current one (change the name of course) with hardcoded UTF8 encoding from codepage for each string once retrieved from the database. Take care about string length as UTF8 ones will be equal or longer than the original ones. You can just override one single method to do this. This is also told a few times on this list. Maybe it is not documented at the right place? It is not documented at all. Just like the rest of the database-stuff. But maybe I should write a FAQ for fpc. With the new lazarus-versions using UTF-8 by default, this is asked quite often. Joost ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Sorry, but I meant comparing with collation. I did not mean comapring within labguage context. How can you do /proper/ collation while ignoring the language context? 1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's) 2) 'sikici' which means 'fucker' in English Depends how you normalize. Normalize should sbstitute all *equal* letters (or combination thereof) into one single form. That allows comparing and matching them. Again, we're not quite on the same page here... What you're referring is more like 'Text Normalization' [ http://en.wikipedia.org/wiki/Text_normalization ] where you do definitely need a very comprehensive dictionary so that '1' is equal to 'one' and '1st' is 'first', etc. (if your language is English). Whereas, what I am referring to is 'Unicode Normalization' [ http://en.wikipedia.org/wiki/Unicode_normalization ]. This one is much narrower in scope. It deals basically with what I can refer to as 'character glyphs'. Now, from what I understand from the definitions of 'Unicode Normalization' there are 2 ways of doing it: 1) You decompose both texts (so that you have all 'weird' characters ezpanded into their combining characters) 2) You compose both texts (so that, you have as few or no combining characters) This is done, obviously, to get them both in the same format --to make life easier to compare. If you do no other operation on these two texts before you compare them, this is called Canonical Equivalnece Test --each 'character glyph' in each text must be the same. For Canonical Equivalnece Test, you do not need to have any 'language' attribute --afer all, you're doing a simple byte-wise test. On the other hand, if you wish to do a broader comparison, Compatibility Equivalnece Test or something other, you will need to do a little more work on those texts: Normalization is one of them. I suggest you take a look at the 'Normalization' heading under http://en.wikipedia.org/wiki/Unicode_normalization Trouble with the 'Normalization' described there is, it is far too crude for quite a lot of purposes. A better form of comparison is, converting both text to either uppercase or to lowercase. And, once we do this, we hit two walls (or obstacles) to overcome. The steps I can think of are: 1) Equivalent code points. We need first to 'compose' the text and then substitute the relevant (and preferred) equivalent code points for any 'character glyph's in the texts. 2) We also need to take care of stuff like language dependent case transforms. See http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I As far as I know, this is the only 'proper' thing to do for search and comparison operations under unicode. I know it will be slower, but, that is the price to pay. Note: The reason I used the term 'character glyphs' is because, several codepoint can be combined to make a 'character glyph'. See the definition of Code Point [ http://unicode.org/glossary/ ] which says: "Code Point: Any value in the Unicode codespace; that is, the range of integers from 0 to 1016." As an example, from the above Wiki article, we can use 2 code points to produce a 'character glyph', such as 'n' + '~' --> ñ But yes, even this is very limited (busstop), because even if you know the language of the wort (german in my example) you do not know its meaning. You do not worry about the meaning at all. In all languages (I guess) there are several words that may be written the same but mean different things. Without a full dictionary, you do not know if ss and german-sharp-s are the same or not. True. But, if you do know it is in German, then you definitely know they are. And, this makes a lot of difference. So basically what you want to do, can only be done with a full dictionary. Or you have to accept false positives. Nope. No false positives in text level. You can always, of course, get false positives in semantic level --such as when you're looking for 'apple' (the fruit) and 'Apple' (the brand name), but that's a completely different problem. I also fail to see why a utf8 string is a half baked solution. It will serve most people fine. It can be extended for those who want more. I have nothing against UFT-8 or any other encoding schemes. It is just that --en encoding scheme. Most handy as a means of transport data from one medium/app to another. But, UFT-8 does in no way cover the whole of Unicode or is a complete solution for dealing with unicode. It is, after all, an encoding scheme. BUT of course there is no way do deal with the ambitious "Busstop" Not even if you knew that "Busstop" was a german string? In deed. For this case, you need to know what language "Busstop" was written in. you need a dictionary. knowing it is German is not enough. because all that "it is german" tells you is, that "ss" maybe a sharp-s, but doesn't have to be A dictionary, then, wouldn't help you eit
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember wrote: IMHO The discussion splits here between: 1) How can this be done in a specific app 2) what should fpc provide as for 2: This would be on top of yet (afaik) missing basic functions such as Compare using collation x (where collation is given as argument to compare, not as part of any string) I think we're beginning to be on the same page --but, please, can you refrain from using the word 'collation'; every time I see that in this context, I feel a strong need to open the window and shout "collation isn't the most important/used part of a language wrt programming" :) Sorry, but I meant comparing with collation. I did not mean comapring within labguage context. language context is to complex to be basic (see busstop below) 2) actual compare, you need to "normalize" all strings before comparing, then compare the normalized string as bytes. normalizing means for each char to decide how to represent it. German "ae" could be represented as a umlaut for the compare. Or (in German text) you expand all umlaute first. IOW, SameText() and similar stuff must take normalization into account. But, you do know that 'normalization' is a very rough assumption and land you in some very embarassing situations. Here is 2 words from Turkish. 1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's) 2) 'sikici' which means 'fucker' in English Depends how you normalize. Normalize should sbstitute all *equal* letters (or combination thereof) into one single form. That allows comparing and matching them. But yes, even this is very limited (busstop), because even if you know the language of the wort (german in my example) you do not know its meaning. Without a full dictionary, you do not know if ss and german-sharp-s are the same or not. So basically what you want to do, can only be done with a full dictionary. Or you have to accept false positives. I also fail to see why a utf8 string is a half baked solution. It will serve most people fine. It can be extended for those who want more. IMHO this is a case for an add-on library. And apparently no one has yet volunteered to write it Now, when you normalize these you get 'SIKICI' for both which --then-- you would assume to be the same. BUT of course there is no way do deal with the ambitious "Busstop" In deed. For this case, you need to know what language "Busstop" was written in. you need a dictionary. knowing it is German is not enough. because all that "it is german" tells you is, that "ss" maybe a sharp-s, but doesn't have to be What I can not do (or what I do not want to do) is to decide which of them other people do want to use. But, isn't this just that: IOW, you're deciding what other people will NOT want to use if you throw the 'language' attribute (for each char) out of the window.. True, I am happy to do that. NOT I am glad we have met :) have we? I remember a mail conversation, but not an actual meeting :) SCNR Why you can always extend this. Store you string in any of the following ways 1) every 2nd char is a language attribute, not a char 2) store the language attributes in a 2nd string, always pass both strings around Of course, these and even more creative hacks could be devised. The question is, is the language an attribute of a unicode character? (I assume "mandatory attribute") Well as much as it is or is not an attribute of a latin1 or iso-whatever char. I do not think it is. I have no proof. But a lot of people seem to think so, if I goggle Unicode (or any other char/latin./iso...) I get nice character tables; and no language info. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
[Note that, here 'TCharacter' isn't necessarily an object; it might as well be a simple record structure.] AFAIK for most programmers this is not a common task. Most programs need less (one language or codepage) But, when you're talking unicode, codepage is rather meaningless --isn't it? or more (phonetic, semantic, statistical search). Can you explain, why you think that this particular problem requires compiler magic? See my other reply to Martin Friebe, in another sub thread. Is there, in Unicode, start-stop markes that denote 'language'? Is it needed? Are the any unicode characters, that upper/lower depend on language? Yes. See my other reply to Martin Friebe, in another sub thread. Take this, for example: "if SameText(SomeString, SomeOtherString) then do ..." For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. Comparing texts can be done with various meanings. For example: byte comparison, simple case insensitive comparison, not literal comparison, compare like this library, Which one do you mean? Byte comparison isn't what I am worried about. In every language, there a pretty known and fixed (by now) rules that apply to string comparison. I am referring to those rules. [...] Here is a simple example for you: "if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..." Now.. how are you going to decide that SameText() function here returns true unless you have information that the substring 'FoolStraße' is in German? The two strings have the same language, but are written with different Rechtschreibung. You need dictionaries and spelling systems to implement such comparisons. This is beyond a compiler or a RTL. Are you sure. I was under the impression that Unicode covers these --without needing further data. What about loan words? For all practical purposes, 'loan words' belong to the language they are used in. Except the case where we'd be discussing etymology. SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. Otherwise, the same SameText() has to return false. I doubt that it is that easy. Well.. I never said that it would be that easy. But, if strip off the language attribute from the caharcater, it will be impossible --or several orders of magnitude harder for those people who need it. You can, of course, ignore all that. But, then, what is the point of going unicode? We were just fine doing things ANSI-centric.. Weren't we? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Actually for you example case doesn't matter. as you need to decide if "ss" = "ß" And, this is only valid in German. For all other, the result must either be false, or undefined. Is there, in Unicode, start-stop markes that denote 'language'? I do not know, that was why I said "unused unicode" and "implemented on top" (as part of the specific app) As far as I know, there isn't a language delimiter in Unicode. IMHO The discussion splits here between: 1) How can this be done in a specific app 2) what should fpc provide as for 2: This would be on top of yet (afaik) missing basic functions such as Compare using collation x (where collation is given as argument to compare, not as part of any string) I think we're beginning to be on the same page --but, please, can you refrain from using the word 'collation'; every time I see that in this context, I feel a strong need to open the window and shout "collation isn't the most important/used part of a language wrt programming" :) Take this, for example: "if SameText(SomeString, SomeOtherString) then do ..." For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. I would rather say: "There are special cases where you need/want to know which language" Yes. And, if we're on our way to make FPC unicode-enabled, we need to take these special cases into account. Otherwise, we will likely end up with a half baked 'solution'. So I do not imply how special or none special those cases are => you do not always need to know. (continued below on your example) Why would I need to ALWAYS need it. Isn't 'needed when necessary' good enough? 2) actual compare, you need to "normalize" all strings before comparing, then compare the normalized string as bytes. normalizing means for each char to decide how to represent it. German "ae" could be represented as a umlaut for the compare. Or (in German text) you expand all umlaute first. IOW, SameText() and similar stuff must take normalization into account. But, you do know that 'normalization' is a very rough assumption and land you in some very embarassing situations. Here is 2 words from Turkish. 1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's) 2) 'sikici' which means 'fucker' in English Now, when you normalize these you get 'SIKICI' for both which --then-- you would assume to be the same. Well.. I'd like to see you (or your boss) when you've come up will all those 'fucker's instead of all those 'boring' old farts you were lookin for :P [You might probably think of a German --or some othe language-- example] IOW, what I am trying to tell you is that normalization isn't really useful --it is, IMO, a stopgap solution along the path of Unicode evolution. BUT of course there is no way do deal with the ambitious "Busstop" In deed. For this case, you need to know what language "Busstop" was written in. What I can not do (or what I do not want to do) is to decide which of them other people do want to use. But, isn't this just that: IOW, you're deciding what other people will NOT want to use if you throw the 'language' attribute (for each char) out of the window.. True, I am happy to do that. NOT I am glad we have met :) Why you can always extend this. Store you string in any of the following ways 1) every 2nd char is a language attribute, not a char 2) store the language attributes in a 2nd string, always pass both strings around Of course, these and even more creative hacks could be devised. The question is, is the language an attribute of a unicode character? SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. ok thats what I did not know. But still in most cases it will be fine to do SameText('Istanbul', 'istanbul', lGerman) SameText('Istanbul', 'istanbul', lTurkish) decide at the time of comparing Well, the prototype I had in mind was: SameText('Istanbul', 'istanbul', lGerman, lTurkish) weher the defaults for the latter 2 parameters would be lUnknown --this way, people who needen't be bothered about these would not even notice. If however the info was stored on the string (or char) what if one was Turkish, the other German ? SameText('Istanbul', 'istanbul', lTurkish, lGerman) This one must return FALSE since, in Turkish, uppercased dotted small 'i' is DOTTED capital 'i' (i.e. 'İ'). and, SameText('Istanbul', 'istanbul', lTurkish, lGerman) will return TRUE since uppercasing both sides result in the same string. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember wrote: Martin Friebe wrote: Just to make sure, all of this discussion is based on various collation No part of this discussion is based on collation. Ok, so we were talking about different things Here is a scenario for you: You have multilanguage text as data. Someone has asked you to search it and see if a certain peice of string (in a given language) exists in it. This search needs to be NOT case-sensitive. Actually for you example case doesn't matter. as you need to decide if "ss" = "ß" How can you do this? Is it doable if TCharacter (or wahtever you call it) has no 'langauge' attribite? For the purpose of case-sensitivity. I still do not know of a character or rather a pair of upper and lower case char) that maps different in some languages? Is there a pair of character "x" and "X" which should in some languages be matching upper/lower, but in other languages should not? ^^ ignore, found your example at the end of mail Otherwise how do I understand the case-insensitive part of your question? Because if "x" is the lowercase of "X" in *all* languages, then I do not need the language specific info to do the none-case-sensitive compare. Sorry if I am still missing some point... [Note that, here 'TCharacter' isn't necessarily an object; it might as well be a simple record structure.] Yes we agreed on this part Besides instead of storing it per char, you can use unused unicode as start/stop markers. So it can be implemented on top of a string that stores unicode-chars (and chars only, no attributes) Is there, in Unicode, start-stop markes that denote 'language'? I do not know, that was why I said "unused unicode" and "implemented on top" (as part of the specific app) IMHO The discussion splits here between: 1) How can this be done in a specific app 2) what should fpc provide as for 2: This would be on top of yet (afaik) missing basic functions such as Compare using collation x (where collation is given as argument to compare, not as part of any string) Why is language intrinsic to the text? An "A" is an "A" in any language. At best language is intrinsic to sorting/comparing(case on non case-sense) text Comparing is a lot more important an operation than collating --or, rather, collation is achieveable only if you can do proper comparisons. Take this, for example: "if SameText(SomeString, SomeOtherString) then do ..." For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. I would rather say: "There are special cases where you need/want to know which language" So I do not imply how special or none special those cases are => you do not always need to know. (continued below on your example) If you dont have that informtaion, you might as well not have a SameText() function in FPC. Please note the 'case-INsensitive' keyword there. Well I needed an actual example where case sense differs by language (assuming we talk about language using the same charset (not comparing Chinese whit English). Here is a simple example for you: "if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..." Well that is a good question, do you always want that to return the same? "Busstop" and "Bußtop" (Yeah the second is not a word, but could occur in a text) Also in Names this comparisons does not always apply. the Name "Heiße" (originally with ß) can be spelled as "Heisse" But the Name "Heisse" (originally with "ss") is never the same has "Heiße" But as for asking me: This a specialized comparison, Similar to soundex (compare sound of 2 words, usually based on english) Something like this is usually found in extension libraries, but not in the standard functionally of a (many/most) languages. In any case I think this also has the minority problem. Most people do not want to compare pascal strings this way (and if it only is because of false positives) That does not mean that I say such functionality is not desirable. It would be great having a unit that can be used if needed. Based on the idea that this are optional (or 3rd party) functions, the normal String would not provide for this. (Besides attaching info to each char would probably be to costly, even if implemented in the fpc core string.) Functions like this could take an additional structure declaring the start/stop/change point of every language. In any case, I can write up several different algorithms how to do that. Please do. SameText(), for one, will need all the help it can get. The initial comment was based on collation, and basically would have been about prioritizing in conflicts. There are 2 parts: 1) identifying the language. I would recommend a separate structure, with all language start points. It takes some work to maintain, but should work alternatively use dynarray instead of string. Define a record holding all info per char that you need. overload all operators
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Daniël Mantione schrieb: > > > Op Fri, 12 Sep 2008, schreef listmember: > >> This search needs to be NOT case-sensitive. >> >> How can you do this? >> >> Is it doable if TCharacter (or wahtever you call it) has no 'langauge' >> attribite? > > 'I am on FoolStrasse' versus 'I am on FoolStraße' is not a upper/lower > case issue. Strasse and Straße have the same casing. So yes, you can do > case-insensitive search. > > The problem you describe does exists. ü and ue are equivalent in German, Not in both directions. > but not in Dutch. So someone searching for ü will also want to receive > results for ue, ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Op Fri, 12 Sep 2008, schreef listmember: This search needs to be NOT case-sensitive. How can you do this? Is it doable if TCharacter (or wahtever you call it) has no 'langauge' attribite? 'I am on FoolStrasse' versus 'I am on FoolStraße' is not a upper/lower case issue. Strasse and Straße have the same casing. So yes, you can do case-insensitive search. The problem you describe does exists. ü and ue are equivalent in German, but not in Dutch. So someone searching for ü will also want to receive results for ue, a Dutch speaking person would not. This however, should not be fixed at the string level, but at the file format level. I.e. in HTML you can do . You could design a #27 escape code for text files if you'd like. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Zitat von listmember <[EMAIL PROTECTED]>: >[...] > You have multilanguage text as data. Someone has asked you to search it > and see if a certain peice of string (in a given language) exists in it. > > This search needs to be NOT case-sensitive. > > How can you do this? > > Is it doable if TCharacter (or wahtever you call it) has no 'langauge' > attribite? > > [Note that, here 'TCharacter' isn't necessarily an object; it might as > well be a simple record structure.] AFAIK for most programmers this is not a common task. Most programs need less (one language or codepage) or more (phonetic, semantic, statistical search). Can you explain, why you think that this particular problem requires compiler magic? > [] > Is there, in Unicode, start-stop markes that denote 'language'? Is it needed? Are the any unicode characters, that upper/lower depend on language? >[...] > Comparing is a lot more important an operation than collating --or, > rather, collation is achieveable only if you can do proper comparisons. > > Take this, for example: > > "if SameText(SomeString, SomeOtherString) then do ..." > > For this to work properly, in both 'SomeString' and 'SomeOtherString', > you need to know which language *each* character belongs to. Comparing texts can be done with various meanings. For example: byte comparison, simple case insensitive comparison, not literal comparison, compare like this library, Which one do you mean? >[...] > Here is a simple example for you: > > "if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..." > > Now.. how are you going to decide that SameText() function here returns > true unless you have information that the substring 'FoolStraße' is in > German? The two strings have the same language, but are written with different Rechtschreibung. You need dictionaries and spelling systems to implement such comparisons. This is beyond a compiler or a RTL. > I know that this is a very simple example --that 'ß' exists only in > German, and that you could infer that when you met that char. > > But, this hightlights the problem --and there are times when you cannot > infer. > > > In any case, I can write up several different algorithms how to do that. > > Please do. SameText(), for one, will need all the help it can get. > > > What I can not do (or what I do not want to do) is to decide which of > > them other people do want to use. > > But, isn't this just that: IOW, you're deciding what other people will > NOT want to use if you throw the 'language' attribute (for each char) > out of the window.. What about loan words? > > Or, if this is not what you think of, please clarify by example.. > > Here is another typical example: > > SameText('Istanbul', 'istanbul') can only return true when both > 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. > > Otherwise, the same SameText() has to return false. I doubt that it is that easy. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Friebe wrote: Just to make sure, all of this discussion is based on various collation No part of this discussion is based on collation. I am going to leave out the object question for now. I said all I can say in earlier mails. That's good. Thank you. And also from your comments it appears more a question of collation > being stored with the string, substring, or even each char. Martin, are you doing this on purpose? I mean, are you intentionaly driving me up the wall? Seriously. Can't you forget/drop this 'collation' word?! And, then, think a little deeper. Here is a scenario for you: You have multilanguage text as data. Someone has asked you to search it and see if a certain peice of string (in a given language) exists in it. This search needs to be NOT case-sensitive. How can you do this? Is it doable if TCharacter (or wahtever you call it) has no 'langauge' attribite? [Note that, here 'TCharacter' isn't necessarily an object; it might as well be a simple record structure.] As found in the last mail, there is currently no standard for handling cross-collation in any string function (that is string function, which could be collation based). 1) IMHO only few people would need this. For the majority it would be unwanted overhead. 2) Within those few, there would be too many different Expectation as to what the "standard" should be. If FPC choose one such standard at will, it would benefit almost no one. You're still stuck with that wretched word 'collation'. The best FPC could to is provide storage, for something that is not handled or obeyed in any function handling the data. This doesn't sound desirable to me. If anyone who needs it will have to implement the functions, then those may add there own storage for it too. Besides instead of storing it per char, you can use unused unicode as start/stop markers. So it can be implemented on top of a string that stores unicode-chars (and chars only, no attributes) Is there, in Unicode, start-stop markes that denote 'language'? All the others are not an intrinsic part of o a char at all --they vary by context. Why is language intrinsic to the text? An "A" is an "A" in any language. At best language is intrinsic to sorting/comparing(case on non case-sense) text Comparing is a lot more important an operation than collating --or, rather, collation is achieveable only if you can do proper comparisons. Take this, for example: "if SameText(SomeString, SomeOtherString) then do ..." For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. If you dont have that informtaion, you might as well not have a SameText() function in FPC. Please note the 'case-INsensitive' keyword there. Well I needed an actual example where case sense differs by language (assuming we talk about language using the same charset (not comparing Chinese whit English). Here is a simple example for you: "if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..." Now.. how are you going to decide that SameText() function here returns true unless you have information that the substring 'FoolStraße' is in German? I know that this is a very simple example --that 'ß' exists only in German, and that you could infer that when you met that char. But, this hightlights the problem --and there are times when you cannot infer. In any case, I can write up several different algorithms how to do that. Please do. SameText(), for one, will need all the help it can get. What I can not do (or what I do not want to do) is to decide which of them other people do want to use. But, isn't this just that: IOW, you're deciding what other people will NOT want to use if you throw the 'language' attribute (for each char) out of the window.. Or, if this is not what you think of, please clarify by example.. Here is another typical example: SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. Otherwise, the same SameText() has to return false. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Just to make sure, all of this discussion is based on various collation for European languages? Or shall we include Arabic, Chinese and other languages? But they have there own chars, they can be identified without collation, so they do not need the language info, to be distinguished from European text. (They may have collations, the same as a German text could be handled in different collations) listmember wrote: So maybe the design is quite well thought? Adding a flag field is easy enough --if all you're doing is to do some sort of collation. In that sense, everything is well tought out. But.. Life becomes very complicated when you begin to do things like FTS (full text search) on a multilanguage text in a DB engine. Your options, in this case, is just very limited: -- Ignore the langage issue. or -- store each language in a different field (that is if you know how many there will be). Do you think this is a good solution --or, a hack. True, that would be hard to do (in DB or pascal, or most other places). But again this is a very special case. And that is why none of the frameworks (DB, pascal, ...) include it. You have to do your own solution. At no time did I say (nor did afaik anyone else say) that you can not do your own object based text holding objects. The question were: 1) should FPC replace the string, by an object (like java) 2) which additional attributes should be stored by a string (per string / per char) And actually both of those question can be moved out of the context of Unicode implementation. Because, both of them could also bee applied to current (char=byte) based strings. I am going to leave out the object question for now. I said all I can say in earlier mails. And also from your comments it appears more a question of collation being stored with the string, substring, or even each char. As found in the last mail, there is currently no standard for handling cross-collation in any string function (that is string function, which could be collation based). 1) IMHO only few people would need this. For the majority it would be unwanted overhead. 2) Within those few, there would be too many different Expectation as to what the "standard" should be. If FPC choose one such standard at will, it would benefit almost no one. The best FPC could to is provide storage, for something that is not handled or obeyed in any function handling the data. This doesn't sound desirable to me. If anyone who needs it will have to implement the functions, then those may add there own storage for it too. Besides instead of storing it per char, you can use unused unicode as start/stop markers. So it can be implemented on top of a string that stores unicode-chars (and chars only, no attributes) As for Storing info per string or per char. (Info could be anything: collation, color, style, font, source-of-quote, author, creation-date, file, ) everyone would like there own. So again FPC shouldn't do it. Or everyone gets all the overhead of what all the others wanted. Collation is a function of language. Right but language is something you can apply to strings. You are not forced to do so. Strings work very well without language too. Same as you saying "no gui". Strings work without display. Font/Style is a function of rendering. I may want to search a string but only want to look at chars marked as bold. Languages is an extension to string, in the same way than rendering info, or source info is. To you language may matter a great deal. To others other attirbutes will matter. All the others are not an intrinsic part of o a char at all --they vary by context. Why is language intrinsic to the text? An "A" is an "A" in any language. At best language is intrinsic to sorting/comparing(case on non case-sense) text If pascal doesn't suit the need of a specific task, choose a different tool. Instead of inventing a new pascal. Thank you for the advice. But, instead of jailing this discussion to at best a laterally relevant issue of collation, can I ask you to think for a moment: How on earth can you do a case-INsensitive search in *any* given string contains multiple language substrings? Please note the 'case-INsensitive' keyword there. Well I needed an actual example where case sense differs by language (assuming we talk about language using the same charset (not comparing Chinese whit English). In any case, I can write up several different algorithms how to do that. What I can not do (or what I do not want to do) is to decide which of them other people do want to use. search none-case-sensitive 'UP LOW' in ' ups upper lows lower' with the following attributes: 'UP LOW' is a string of 2 languages. The word UP is in a language that defines "U" and "u" as different letters (not only differ by case, but differ the same as "a" and "b" do differ) The word LOW is in a languages where all letters are having low-case equivalents (as in Engl
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Hi, Thanks for pointing me to the Lazarus thread about this and the bug report. Checked them. But as I understand there is no solution available at the moment for this. I have a database that is not encoded utf8 (and it will never be because other client programs are accessing it and their users do not want/need to be converted to unicode). How do I get the field values into FPC/Lazarus into a string variable? Right now the non-unicode strings are returned as empty from a database field due to FCL conversion functions. Not to mention writing something to the database back. Is there a function to convert 'My Perfect™ World®' to whatever format the components require and vice versa? Something for the ASCII table up till #255 (English letters with some special characters like the above example). JoshyFun wrote: Hello ABorka, Thursday, September 11, 2008, 7:26:50 PM, you wrote: A> The database field can contain any string with '®' in it for this to happen A> for example: 'sometext®' A> It seems that A> ListBox1.Items.Add(SQL1.FieldByName('MyTableField').AsString); [...] A> will only put an empty string into the Listbox. A> Somewhere inside FCL, where the Listbox item is inserted there is a A> UTF8Decode which ends up with the empty string because of the '®' #174 A> character it thinks that it is a unicode encoded character and tries to A> get the additional bytes for it which ain't there. http://bugs.freepascal.org/view.php?id=11791 A> Not sure how can this be circumvented (using some conversion function?) A> or if it is a bug. Check Lazarus list, there is a quite recent thread about that "Unicode and DBAware" (is the text of the subject). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
So maybe the design is quite well thought? Adding a flag field is easy enough --if all you're doing is to do some sort of collation. In that sense, everything is well tought out. But.. Life becomes very complicated when you begin to do things like FTS (full text search) on a multilanguage text in a DB engine. Your options, in this case, is just very limited: -- Ignore the langage issue. or -- store each language in a different field (that is if you know how many there will be). Do you think this is a good solution --or, a hack. As for Storing info per string or per char. (Info could be anything: collation, color, style, font, source-of-quote, author, creation-date, file, ) everyone would like there own. So again FPC shouldn't do it. Or everyone gets all the overhead of what all the others wanted. Collation is a function of language. All the others are not an intrinsic part of o a char at all --they vary by context. Also FPC is a programming language. Not a word processing tool Well, they should have remembered that before they added in char and string types when everything could perfectly be represented with a byte. Then instead of asking for strings as object, I would ask for an additional ref-counted object type (with auto destruction). The string library could be based on this. I am not asking for suxch a think because a) it wouldn't be pascal anymore. b) beware of the mem-leaks Personally, I gave up on strings as objects on the compiler level. That could, of course be added as a lib. If pascal doesn't suit the need of a specific task, choose a different tool. Instead of inventing a new pascal. Thank you for the advice. But, instead of jailing this discussion to at best a laterally relevant issue of collation, can I ask you to think for a moment: How on earth can you do a case-INsensitive search in *any* given string contains multiple language substrings? Please note the 'case-INsensitive' keyword there. Btw in normal math you can not devide a number by zero... Of course you can define your own math And, the point is??.. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember wrote: I also do not know of other apps that could do this. (And it may not be possible). Look around. Databses for example, AFAIK the most you can do is define a collation per column. True. But, that does not mean that those app/databases are well thought out. Does it? Point of View. Those DB get sold, so either people take what they can get and silently accept it (I haven't seen discussions like this on related DB discussion groups [ or maybe I read the wrong groups :) ]) or the majority of people doesn't need it. BTW people want there DB to sort text in a way, that help finding entries in the result. So the ordering process should not rely on knowledge if a word is English or French. If It did rely on the language, then the ordering would not help the search, because you have to know the language of all other words to find the one word you are looking for. So maybe the design is quite well thought? And how would you sort the following example, with mixed collation. Take the various german collations. ae can be used as a substitution for a-umlaut. This is actulaly an arbitary decision --there is no agreed standard on this, that I am aware-- so, each developer can have their own way. Well yes of course you can define how to. But then everyone has a different need, and a different definition. That would mean FPC had to implement dozens of algorithms. So it seems better to leave it to each person, as it seems it will be an individual thing anyway. As for Storing info per string or per char. (Info could be anything: collation, color, style, font, source-of-quote, author, creation-date, file, ) everyone would like there own. So again FPC shouldn't do it. Or everyone gets all the overhead of what all the others wanted. Also FPC is a programming language. Not a word processing tool And FPC is pascal. Pascal (afaik) has reference counted strings. And objects are not reference counted. Not to mention objects (as string type) would only benefit if everyone was allowed to create their own child-classes. Then instead of asking for strings as object, I would ask for an additional ref-counted object type (with auto destruction). The string library could be based on this. I am not asking for suxch a think because a) it wouldn't be pascal anymore. b) beware of the mem-leaks If pascal doesn't suit the need of a specific task, choose a different tool. Instead of inventing a new pascal. I don't to shell scripts in pascal. And simple web scripts are php or perl. How would you sort data where one source is of one collation, the other source of another (or even worse the collation changes halfway through)? It is impossible by definition. No. It is not impossible. But, yes, there is no definition (standard). It would be upto the developer or the entity that the developer is working in. Btw in normal math you can not devide a number by zero... Of course you can define your own math I even thing that collation is not part of the string. it does not change the meaning of the string. It is only used in specific operations. And then it must be one collation for both strings. So if each of the string had a collation that would cause an issue. But, my question is --imho-- a lot more relevant to the thread at hand: How would you do case-insensitive search in a multilangual text. same as above applies. If every char (or substring) has a collation of its own, then you need to define how to compare cross-collation. because find('E'[collation1], 'merci'[collation2] + 'mein herr'[collation3]) needs to compare an E (that wants collation1 for the compare) with each of the 'e' (that want other collations) maybe collation1 says that E should equal in upper and lower, while the other collations do not? ore vice versa. there is no standard. [this has nothing to do with rendering or GUI.] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
IMHO You can't? But you could use a TStringList. I don't think I could. Because, in TStringList, you have no way of knowing what language each item belogs to. You could, of course, work around it by adding a fake object to each item denoting the language, but that does mean a generalized solution. I also do not know of other apps that could do this. (And it may not be possible). Look around. Databses for example, AFAIK the most you can do is define a collation per column. True. But, that does not mean that those app/databases are well thought out. Does it? And how would you sort the following example, with mixed collation. Take the various german collations. ae can be used as a substitution for a-umlaut. This is actulaly an arbitary decision --there is no agreed standard on this, that I am aware-- so, each developer can have their own way. How would you sort data where one source is of one collation, the other source of another (or even worse the collation changes halfway through)? It is impossible by definition. No. It is not impossible. But, yes, there is no definition (standard). It would be upto the developer or the entity that the developer is working in. I even thing that collation is not part of the string. it does not change the meaning of the string. It is only used in specific operations. And then it must be one collation for both strings. So if each of the string had a collation that would cause an issue. But, my question is --imho-- a lot more relevant to the thread at hand: How would you do case-insensitive search in a multilangual text. [this has nothing to do with rendering or GUI.] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember wrote: Actually, UTF-8 can contain bidi info, it's indeed a matter of the renderer. And, how do you propose doing a case-insensitive search in a given text that contains multiple languages? I assume you speak of multiply collations in on string? IMHO You can't? But you could use a TStringList. I also do not know of other apps that could do this. (And it may not be possible). Look around. Databses for example, AFAIK the most you can do is define a collation per column. And how would you sort the following example, with mixed collation. Take the various german collations. ae can be used as a substitution for a-umlaut. In some collation it sorts as ae (between ad and af), in others it sorts as "a-umlaut" (immediately behind "a") 1) a, ab, ae 2) a, ae, ab How would you sort data where one source is of one collation, the other source of another (or even worse the collation changes halfway through)? It is impossible by definition. Because taking the 2 Strings above, each of them can come first when sorted depending on the collation, but if more than one collation was involved the result was undefined. I even thing that collation is not part of the string. it does not change the meaning of the string. It is only used in specific operations. And then it must be one collation for both strings. So if each of the string had a collation that would cause an issue. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Actually, UTF-8 can contain bidi info, it's indeed a matter of the renderer. And, how do you propose doing a case-insensitive search in a given text that contains multiple languages? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Thu, 11 Sep 2008 22:56:49 +0200 Martin Schreiber <[EMAIL PROTECTED]> wrote: >[...] > > Doesn't that mean we will be --by design-- unable to write something > > like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption? Yes and more. See below. > > This is why I keep asking that the 'TCharacter' or 'TChar' needs to > > have a language attribute. > > > MSEgui has a richstringty type, a combination of a widestring and a > dynamic array of formatting info. There are formatting infos for the > changes only, a richstringty without formatting info has a nil > pointer for the dynamic array. See lib/common/kernel/mserichstring.pas > http://sourceforge.net/projects/mseide-msegui/ > > " > type > newinfoty = (ni_bold=ord(fs_bold),ni_italic=ord(fs_italic), > ni_underline=ord(fs_underline),ni_strikeout=ord(fs_strikeout), > ni_selected=ord(fs_selected), > //same order as in fontstylety > ni_fontcolor,ni_colorbackground,ni_delete); > newinfosty = set of newinfoty; > > const > fonthandleflags = [ni_bold,ni_italic]; > fontstyleflags = > [ni_bold,ni_italic,ni_underline,ni_strikeout,ni_selected]; > > type > charstylety = record > fontcolor,colorbackground: pcolorty; > fontstyle: fontstylesty; > end; > pcharstylety = ^charstylety; > > charstylearty = array of charstylety; > > formatinfoty = record > index: integer;//0-> from first char > newinfos: newinfosty; > style: charstylety; > end; > > pformatinfoty = ^formatinfoty; > formatinfoarty = array of formatinfoty; > pformatinfoarty = ^formatinfoarty; > > richstringty = record > text: msestring; > format: formatinfoarty; > end; > " > > It was designed for fast processing in MSEide source code editor. It is fast, but it misses some Unicode features, like compound characters. For example: Mac OS X file system uses compound characters for german umlaute. MSEide shows the o umlaut as o followed by a box. Lazarus SynEdit under gtk2 shows it correct, because it uses pango, which has an almost complete Unicode implementation. But editing is wrong in SynEdit, because it does not handle compound characters yet. Gladfully typing an o-umlaut creates a 'normal' single character in SynEdit. The native gtk2 widgets like TButton and TEdit handle compound characters correctly. I wonder how a TCharacter will be defined that supports all Unicode features. Probably it will be a monster, that only few text editors want to use. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: I've continued to work on support of an unicodestring type in fpc. It's currently in an svn branch at: http://svn.freepascal.org/svn/fpc/branches/unicodestring and will be merged later to trunk. The unicodestring type is a ref. counted utf-16 string. On non-windows, widestring is mapped to this type. If you're interested in unicode support please test, give feedback here and submit fixes. I have a crash in MSEide startup in a procedure finalization section: " #0 77892373 :0 ??() #1 0082CDF4 :0 U_SYSTEM_ENTRYINFORMATION() #2 03B7FB2C :0 ??() #3 03B7FAAC :0 ??() #4 03C22C1C :0 ??() #5 0082D9F4 :0 U_SYSTEM_FREELISTS() #6 03B7F874 :0 ??() #7 0040F5EB heap.inc:1127 SYSFREEMEM(P=0x0) #8 778922F8 heap.inc:0 ??() #9 0082E500 heap.inc:0 U_HEAPTRC_OWNFILE() #10 00410482 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void) #11 0040FE94 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 812, SPINCOUNT = 0}) #12 00414571 heaptrc.pp:666 TRACEFREEMEMSIZE(P=0x6d11b8, SIZE=0) #13 004146BB heaptrc.pp:722 TRACEFREEMEM(P=0x6d11b8) #14 0040E404 heap.inc:275 FREEMEM(P=0x6d11b8) #15 004093FA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x6d11b8) #16 0040947D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x6d11b8) #17 004A9B1C msesysintf.pas:306 WINFILEPATH(DIRNAME=0x0, FILENAME=0x6d11b8, result=0x3c22fa8) #18 004AB63F msesysintf.pas:1436 SYS_OPENDIRSTREAM(STREAM={INFOLEVEL = FIL_NAME, DIRNAME = 0x3c23148, MASK = 0x3b7faa0, INCLUDE = [FA_ALL], EXCLUDE = [], PLATFORMDATA = {0, 208983208, 1, 4294967295, 0, 0, 0, 0}}) #19 004B5BB2 msefileutils.pas:640 SEARCHFILE(AFILENAME=0xc7a07b0, ADIRNAME=0x3c22e08, result=0x0) #20 004B5DED msefileutils.pas:671 SEARCHFILE(AFILENAME=0x6bf2f8, ADIRNAMES=0x3c22ed8, highADIRNAMES=0, result=0x0) #21 004B5F9C msefileutils.pas:698 FINDFILE(FILENAME=0x6bf2f8, DIRNAMES=0x3c22ed8, PATH=0x0, highDIRNAMES=0) #22 004C03E1 msestatfile.pas:244 TSTATFILE__READSTAT(STREAM=0x0, this=0x3c6b918) #23 00453CF8 main.pas:1514 TMAINFO__MAINONLOADED(SENDER=0x3c03d40, this=0x3c03d40) #24 0050A717 mseforms.pas:854 TCUSTOMMSEFORM__DOEVENTLOOPSTART(this=0x3c03d40) #25 0050A763 mseforms.pas:863 TCUSTOMMSEFORM__RECEIVEEVENT(EVENT=0xc7016f8, this=0x3c03d40) #26 0048CA3A mseevent.pas:213 TOBJECTEVENT__DELIVER(this=0xc7016f8) #27 0042E7D0 msegui.pas:12666 TINTERNALAPPLICATION__EVENTLOOP(AMODALWINDOW=0x0, ONCE=false, this=0x3bd9460) #28 0042F52C msegui.pas:13063 TINTERNALAPPLICATION__DOEVENTLOOP(ONCE=false, this=0x3bd9460) #29 0048B3F8 mseapplication.pas:1132 TCUSTOMAPPLICATION__RUN(this=0x3bd9460) #30 004025D1 mseide.pas:59 main() " I could not find a simple program to demonstrate the failure. Something strange is that the following procedure calls fpc_WideStr_Decr_Ref in finalization section: " const quotechar = unicodechar('"'); procedure requote(var path: unicodestring; const newvalue: unicodestring); begin if punicodechar(path)^ = quotechar then begin path:= quotechar + newvalue; end else begin path:= newvalue; end; end; " I saw that you merged unicodestring to trunk. Should I test with trunk instead of unicodestring branch? Yes. Unicodestring branch is closed. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Thursday 11 September 2008 22.33:32 listmember wrote: > >> procedure TLabel.Paint(...) > >> begin > >> if *Caption.IsRTL *then > >> DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) > >> else > >> DrawCaption(0,0,*Caption.AsUTF8*, flags); > >> end; > >> > >> Is not that enough? > > > > What is the gain as opposed to > > > > procedure TLabel.Paint(...) > > begin > >if IsRTL(Caption) then > > DrawCaptionRTL(0,0,AsUTF8(Caption), flags) > > else > > DrawCaption(0,0,AsUTF8(Caption), flags); > > end; > > > > In other words where is the benefit from OOP in this ? > > IMO, both are deficient as they both assume that a string block (text) > is either RTL or LTR. > > Doesn't that mean we will be --by design-- unable to write something > like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption? > > This is why I keep asking that the 'TCharacter' or 'TChar' needs to have > a language attribute. > MSEgui has a richstringty type, a combination of a widestring and a dynamic array of formatting info. There are formatting infos for the changes only, a richstringty without formatting info has a nil pointer for the dynamic array. See lib/common/kernel/mserichstring.pas http://sourceforge.net/projects/mseide-msegui/ " type newinfoty = (ni_bold=ord(fs_bold),ni_italic=ord(fs_italic), ni_underline=ord(fs_underline),ni_strikeout=ord(fs_strikeout), ni_selected=ord(fs_selected), //same order as in fontstylety ni_fontcolor,ni_colorbackground,ni_delete); newinfosty = set of newinfoty; const fonthandleflags = [ni_bold,ni_italic]; fontstyleflags = [ni_bold,ni_italic,ni_underline,ni_strikeout,ni_selected]; type charstylety = record fontcolor,colorbackground: pcolorty; fontstyle: fontstylesty; end; pcharstylety = ^charstylety; charstylearty = array of charstylety; formatinfoty = record index: integer;//0-> from first char newinfos: newinfosty; style: charstylety; end; pformatinfoty = ^formatinfoty; formatinfoarty = array of formatinfoty; pformatinfoarty = ^formatinfoarty; richstringty = record text: msestring; format: formatinfoarty; end; " It was designed for fast processing in MSEide source code editor. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Marco van de Voort schrieb: In our previous episode, listmember said: > else > DrawCaption(0,0,AsUTF8(Caption), flags); > end; > > In other words where is the benefit from OOP in this ? IMO, both are deficient as they both assume that a string block (text) is either RTL or LTR. The assignment only transfers data to the object "TCaption". Doesn't that mean we will be --by design-- unable to write something like 'Yom Kippur (??? ??)' on a caption? TCaption is responsible for rendering. Including LTR and RTL. Not the string. Actually, UTF-8 can contain bidi info, it's indeed a matter of the renderer. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, listmember said: > > else > > DrawCaption(0,0,AsUTF8(Caption), flags); > > end; > > > > In other words where is the benefit from OOP in this ? > > IMO, both are deficient as they both assume that a string block (text) > is either RTL or LTR. The assignment only transfers data to the object "TCaption". > Doesn't that mean we will be --by design-- unable to write something > like 'Yom Kippur (??? ??)' on a caption? TCaption is responsible for rendering. Including LTR and RTL. Not the string. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
>> procedure TLabel.Paint(...) >> begin >> if *Caption.IsRTL *then >> DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) >> else >> DrawCaption(0,0,*Caption.AsUTF8*, flags); >> end; >> >> Is not that enough? > > What is the gain as opposed to > > procedure TLabel.Paint(...) > begin >if IsRTL(Caption) then > DrawCaptionRTL(0,0,AsUTF8(Caption), flags) > else > DrawCaption(0,0,AsUTF8(Caption), flags); > end; > > In other words where is the benefit from OOP in this ? IMO, both are deficient as they both assume that a string block (text) is either RTL or LTR. Doesn't that mean we will be --by design-- unable to write something like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption? This is why I keep asking that the 'TCharacter' or 'TChar' needs to have a language attribute. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: > I've continued to work on support of an unicodestring type in fpc. It's > currently in an svn branch at: > http://svn.freepascal.org/svn/fpc/branches/unicodestring > and will be merged later to trunk. The unicodestring type is a ref. > counted utf-16 string. On non-windows, widestring is mapped to this > type. If you're interested in unicode support please test, give feedback > here and submit fixes. > I have a crash in MSEide startup in a procedure finalization section: " #0 77892373 :0 ??() #1 0082CDF4 :0 U_SYSTEM_ENTRYINFORMATION() #2 03B7FB2C :0 ??() #3 03B7FAAC :0 ??() #4 03C22C1C :0 ??() #5 0082D9F4 :0 U_SYSTEM_FREELISTS() #6 03B7F874 :0 ??() #7 0040F5EB heap.inc:1127 SYSFREEMEM(P=0x0) #8 778922F8 heap.inc:0 ??() #9 0082E500 heap.inc:0 U_HEAPTRC_OWNFILE() #10 00410482 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void) #11 0040FE94 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 812, SPINCOUNT = 0}) #12 00414571 heaptrc.pp:666 TRACEFREEMEMSIZE(P=0x6d11b8, SIZE=0) #13 004146BB heaptrc.pp:722 TRACEFREEMEM(P=0x6d11b8) #14 0040E404 heap.inc:275 FREEMEM(P=0x6d11b8) #15 004093FA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x6d11b8) #16 0040947D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x6d11b8) #17 004A9B1C msesysintf.pas:306 WINFILEPATH(DIRNAME=0x0, FILENAME=0x6d11b8, result=0x3c22fa8) #18 004AB63F msesysintf.pas:1436 SYS_OPENDIRSTREAM(STREAM={INFOLEVEL = FIL_NAME, DIRNAME = 0x3c23148, MASK = 0x3b7faa0, INCLUDE = [FA_ALL], EXCLUDE = [], PLATFORMDATA = {0, 208983208, 1, 4294967295, 0, 0, 0, 0}}) #19 004B5BB2 msefileutils.pas:640 SEARCHFILE(AFILENAME=0xc7a07b0, ADIRNAME=0x3c22e08, result=0x0) #20 004B5DED msefileutils.pas:671 SEARCHFILE(AFILENAME=0x6bf2f8, ADIRNAMES=0x3c22ed8, highADIRNAMES=0, result=0x0) #21 004B5F9C msefileutils.pas:698 FINDFILE(FILENAME=0x6bf2f8, DIRNAMES=0x3c22ed8, PATH=0x0, highDIRNAMES=0) #22 004C03E1 msestatfile.pas:244 TSTATFILE__READSTAT(STREAM=0x0, this=0x3c6b918) #23 00453CF8 main.pas:1514 TMAINFO__MAINONLOADED(SENDER=0x3c03d40, this=0x3c03d40) #24 0050A717 mseforms.pas:854 TCUSTOMMSEFORM__DOEVENTLOOPSTART(this=0x3c03d40) #25 0050A763 mseforms.pas:863 TCUSTOMMSEFORM__RECEIVEEVENT(EVENT=0xc7016f8, this=0x3c03d40) #26 0048CA3A mseevent.pas:213 TOBJECTEVENT__DELIVER(this=0xc7016f8) #27 0042E7D0 msegui.pas:12666 TINTERNALAPPLICATION__EVENTLOOP(AMODALWINDOW=0x0, ONCE=false, this=0x3bd9460) #28 0042F52C msegui.pas:13063 TINTERNALAPPLICATION__DOEVENTLOOP(ONCE=false, this=0x3bd9460) #29 0048B3F8 mseapplication.pas:1132 TCUSTOMAPPLICATION__RUN(this=0x3bd9460) #30 004025D1 mseide.pas:59 main() " I could not find a simple program to demonstrate the failure. Something strange is that the following procedure calls fpc_WideStr_Decr_Ref in finalization section: " const quotechar = unicodechar('"'); procedure requote(var path: unicodestring; const newvalue: unicodestring); begin if punicodechar(path)^ = quotechar then begin path:= quotechar + newvalue; end else begin path:= newvalue; end; end; " I saw that you merged unicodestring to trunk. Should I test with trunk instead of unicodestring branch? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Some conversion problem occurs and empty string put into a TListbox if I try to get a field value with some special characters from a SQL result. (using Zeos) The database field can contain any string with '®' in it for this to happen for example: 'sometext®' It seems that ListBox1.Items.Add(SQL1.FieldByName('MyTableField').AsString); or even var s:string; begin s:=SQL1.FieldByName('MyTableField').AsString; ListBox1.Items.Add(s); end; will only put an empty string into the Listbox. Somewhere inside FCL, where the Listbox item is inserted there is a UTF8Decode which ends up with the empty string because of the '®' #174 character it thinks that it is a unicode encoded character and tries to get the additional bytes for it which ain't there. used the Lazarus-0.9.25-16495-fpc-2.2.3-20080909-win32.exe build Not sure how can this be circumvented (using some conversion function?) or if it is a bug. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt wrote: On Thu, 11 Sep 2008, Anton Kavalenka wrote: Florian Klaempfl wrote: Graeme Geldenhuys schrieb: Remember, Unicode support is much more that simply storing and displaying text. You have various encodings, RTL or LTR direction etc. I can't see how a simple type can keep track of all such information - but then, I don't know the internals of FPC either. ;-) How would an OOP approach solve this? The problem isn't the tracking of things like encoding or directions but handling all these information. procedure TLabel.Paint(...) begin if *Caption.IsRTL *then DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) else DrawCaption(0,0,*Caption.AsUTF8*, flags); end; Is not that enough? What is the gain as opposed to procedure TLabel.Paint(...) begin if IsRTL(Caption) then In other words where is the benefit from OOP in this ? 1) keeping track of info: If you can store the info on an object, so you can store it on a record (afaik). And a string (even current string) is nothing else but a (hidden) record. It already contains length info, and char data. 2) OO style vs functional: Caption.IsRtl may be seen as syntactical sugar. But as far as I can see, it can (almost?) always be translated into functional style. Instead of having child-classes you could overload the function for different types of arguments 3) For the real usage of OO using inheritance: I am not sure if that is a good idea, on any kind of *ref-counted* data/object. I can see cases where the full OO power can make sense. But using OO the objects should not be ref-counted. (IMHO) Ref-Counting mainly is used to free memory automatically. People relay on it, and you get memory leaks. Strings as they currently stand can not contain pointer to other strings. You can not get circular references. ref-counting will work. Objects on the other hand can contain any data, including pointers to other objects or self. Even if the buildin string-objects don't contain that kind of pointer, they can be sub-classed and people will end up with circular references. Oh, and yes, I am aware. This risk already exists with dynarrays. But no need to extend it. So in my opinion, it may be nice to have a library of classes handling all kind of string(or shall we call it "text") data. But no magic on them. They can use PChar and there own GetMem internally. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Anton Kavalenka wrote: How would an OOP approach solve this? The problem isn't the tracking of things like encoding or directions but handling all these information. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel procedure TLabel.Paint(...) begin if *Caption.IsRTL *then DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) else DrawCaption(0,0,*Caption.AsUTF8*, flags); end; Is not that enough? Sorry, cannot stay aside. RTL is a control property, not a string. And AsUTF8 is imo unneeded: procedure TLabel.DrawCaption(ACaption: TUTF8String); begin ... end; procedure TLavel.DrawCaptionRTL(ACaption: TUTF8String); begin ... end; procedure TLabel.Paint(...) begin if RightToLeftAlignment then DrawCaptionRTL(0, 0, Caption, flags) else DrawCaption(0, 0, Caption, flags) end; And Caption can be any desired string type. It will be autoconverted to UTF8String if needed. I see no need in a string class - only unneeded overhead. Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Thu, 11 Sep 2008, Anton Kavalenka wrote: > Florian Klaempfl wrote: > > Graeme Geldenhuys schrieb: > > > > > Remember, Unicode support is much more that simply storing and > > > displaying text. You have various encodings, RTL or LTR direction etc. > > > I can't see how a simple type can keep track of all such information > > > - but then, I don't know the internals of FPC either. ;-) > > > > > > > How would an OOP approach solve this? The problem isn't the tracking of > > things like encoding or directions but handling all these information. > > ___ > > fpc-devel maillist - fpc-devel@lists.freepascal.org > > http://lists.freepascal.org/mailman/listinfo/fpc-devel > > > > > procedure TLabel.Paint(...) > begin > if *Caption.IsRTL *then >DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) > else >DrawCaption(0,0,*Caption.AsUTF8*, flags); > end; > > Is not that enough? What is the gain as opposed to procedure TLabel.Paint(...) begin if IsRTL(Caption) then DrawCaptionRTL(0,0,AsUTF8(Caption), flags) else DrawCaption(0,0,AsUTF8(Caption), flags); end; In other words where is the benefit from OOP in this ? Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Florian Klaempfl wrote: Graeme Geldenhuys schrieb: Remember, Unicode support is much more that simply storing and displaying text. You have various encodings, RTL or LTR direction etc. I can't see how a simple type can keep track of all such information - but then, I don't know the internals of FPC either. ;-) How would an OOP approach solve this? The problem isn't the tracking of things like encoding or directions but handling all these information. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel procedure TLabel.Paint(...) begin if *Caption.IsRTL *then DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags) else DrawCaption(0,0,*Caption.AsUTF8*, flags); end; Is not that enough? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
But it is far more readable when there is special and reserved type for which we could have special operators and converters just like those we have for strings and widestrings. Oh, I thougbt people just complained in this thread that + isn't appropriate for strings anyways ... People are, of course, entitled to their opinions. And, I -for one-- would never force them against their wills to use the '+' operator for any sort of strings. In the same breath, the fact that some of us object to '+' should not, IMO, be the basis to not have 4-byte (or 6-byte) per char strings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember schrieb: compiler guys all the same} and ask, instead, to give us reference-counted 4-byte (actually, preferably 6-bytes) per cell arrays/strings. What's wrong with an dyn. array of DWord? Much like what's wrong with dynamic array of Word (as opposed to Widestring) or with dynamic array of byte (as opposed to string), really... Nothing much. But it is far more readable when there is special and reserved type for which we could have special operators and converters just like those we have for strings and widestrings. Oh, I thougbt people just complained in this thread that + isn't appropriate for strings anyways ... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
compiler guys all the same} and ask, instead, to give us reference-counted 4-byte (actually, preferably 6-bytes) per cell arrays/strings. What's wrong with an dyn. array of DWord? Much like what's wrong with dynamic array of Word (as opposed to Widestring) or with dynamic array of byte (as opposed to string), really... Nothing much. But it is far more readable when there is special and reserved type for which we could have special operators and converters just like those we have for strings and widestrings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
listmember schrieb: >>> But, I could write a gigantic data mining application, a database >>> application >>> or a myriad of such apps that uses the above class without doing a >>> single >>> pixel of GUI stuff. >> >> I'd like to see that: it will be guaranteed dog slow :( > > Hmm.. may be, maybe not. > > Last year I wrote a natural lang parser (Pascal) and gave the source to > a Java developer of friend mine. > > It turned out to be faster in Java --classes and all. For some reason, > using the same algorithm (my code converted to Java, basically), Java > beat my natively compiled code. And, no there was no GUI involved. Without detailed code one can say nothing about it. Just I/O being done wrong can ruin performance. > > compiler guys all the same} and ask, instead, to give us > reference-counted 4-byte (actually, preferably 6-bytes) per cell > arrays/strings. > What's wrong with an dyn. array of DWord? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
But, I could write a gigantic data mining application, a database application or a myriad of such apps that uses the above class without doing a single pixel of GUI stuff. I'd like to see that: it will be guaranteed dog slow :( Hmm.. may be, maybe not. Last year I wrote a natural lang parser (Pascal) and gave the source to a Java developer of friend mine. It turned out to be faster in Java --classes and all. For some reason, using the same algorithm (my code converted to Java, basically), Java beat my natively compiled code. And, no there was no GUI involved. Basing my arguement upon this world-shattering anectodal evidence, I hereby prove my point. So, there :P However, changing the object pascal language, so it requires the use of objects whenever you use strings: this is a different story. And that is what it was all about, after all. Ooops! I joined too late then. OK. I retract {I am said to come from a bargaining culture though I have yet to hone my skills with a carpet dealer, but I'll try my luck with compiler guys all the same} and ask, instead, to give us reference-counted 4-byte (actually, preferably 6-bytes) per cell arrays/strings. If I can have such a beast, it will be fast enough and will also cover almost all of the foreseable problems. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Wed, 10 Sep 2008, listmember wrote: > Michael Van Canneyt wrote: > > You are mixing 2 things. There is the actual string content, and there is > > the > > string metadata. The metadata is something that would apply for flyweight > > pattern. There is nothing to be gained by putting the metadata in an object, > > This is true --upto a point. > > And, that point arises when you wish to be able to work further with a > TCharacter. > > Say, you're doing text processing --display and all. You would definitely like > to be able to derive a new class from TCharacter and call it, say, > TWPCharacter which contains all sorts of other properties, color, style, font, > size etc. > > This would make life immensely easier for such jobs whereby a character may > need to have more attributes than there exists in the base class. > > > since there is only the encoding. Storing the encoding in an object is > > ridiculous and a waste of heap space. a 2 byte encoding is less wasteful > > than a 4 or 8 byte object pointer. > > I am afraid I do not agree with this at all. Or rather, it comes accross a > very ANSI-centric view. You are mixing 2 things: - Texts (strings) at the compiler language level. - (complex) GUI design that needs to handle a lot of text and a lot of extra properties. For GUI design, you may well need all the things you describe. And as I said before: you can do this yourself if you need it. But at the _language level_, there is no need for all these things. They make simple usage of the language impossible. To burden the pascal language with all these things would be a serious mistake. But there is nothing that stops people from doing all these things for themselves if they require it. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Wed, 10 Sep 2008, listmember wrote: > Michael Van Canneyt wrote: > > You are mixing 2 things: > > > > - Texts (strings) at the compiler language level. > > - (complex) GUI design that needs to handle a lot of text and a lot of extra > >properties. > > :) > > If you draw the lines so red and thick, who am I to disagree... > > But, I could write a gigantic data mining application, a database application > or a myriad of such apps that uses the above class without doing a single > pixel of GUI stuff. I'd like to see that: it will be guaranteed dog slow :( But that is not the point. > > For GUI design, you may well need all the things you describe. > > And as I said before: you can do this yourself if you need it. > > True. > > I could also do my own TList, TStringList etc. etc. but, luckily I don't have > to. > > I was under the impression, therefore, that stuff that makes life easier for a > number of developers get to be included into the main distribution for common > use; and not be rejected on the basis of /language level/ . This is another discussion: we could very well decide to include such a string/character handling class in the RTL or FCL, and you could use it. I never said we would refuse such a set of classes. If certain - generally useful - language features are needed to implement such a set of classes, we could even decide to do that. I imagine that auto class instance destruction when the variable goes out of scope, is one of them. I have proposed something like it years ago, because it is broader in scope. However, changing the object pascal language, so it requires the use of objects whenever you use strings: this is a different story. And that is what it was all about, after all. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, listmember said: > > - Texts (strings) at the compiler language level. > > - (complex) GUI design that needs to handle a lot of text and a lot of extra > >properties. > > If you draw the lines so red and thick, who am I to disagree... > > But, I could write a gigantic data mining application, a database > application or a myriad of such apps that uses the above class without > doing a single pixel of GUI stuff. True, and the ability to customize the string type on even the character level would downright kill performance. Because the compiler can't really exploit knowledge that it can now (basic 16-bit array on the base level) > > For GUI design, you may well need all the things you describe. > > And as I said before: you can do this yourself if you need it. > > True. > > I could also do my own TList, TStringList etc. etc. but, luckily I don't > have to. Don't get me started on tstringlist. > I was under the impression, therefore, that stuff that makes life easier > for a number of developers get to be included into the main distribution > for common use; Yes. But most specially it should make life impossible for a group of other developers. (like the ones that have to process a multi GB database export regularly) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt wrote: You are mixing 2 things: - Texts (strings) at the compiler language level. - (complex) GUI design that needs to handle a lot of text and a lot of extra properties. :) If you draw the lines so red and thick, who am I to disagree... But, I could write a gigantic data mining application, a database application or a myriad of such apps that uses the above class without doing a single pixel of GUI stuff. For GUI design, you may well need all the things you describe. And as I said before: you can do this yourself if you need it. True. I could also do my own TList, TStringList etc. etc. but, luckily I don't have to. I was under the impression, therefore, that stuff that makes life easier for a number of developers get to be included into the main distribution for common use; and not be rejected on the basis of /language level/ . ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Yes, but most proposals here about a TCharacter are a bit overkill. In example languare reference for a given char is not very important from a Unicode point of view, unicode focuses its power in the text, so locale is important in context operations and collations. See my other post above. Locale should really have nothing to do with the text/string business. Instead, it should only refer to oddities such as decimal number representations, thousands separators, date and time strings etc. Packing the language into the 'locale' info is an abuse IMO, unless it refers to such things as what kind of help file it should display to the user or the actual strings on menu items (resources) etc. From my point of view the compiler basic types must keep being "basic", so be fast, no more than needed memory eaters and so on. Please don't get resented, but this kind of attitued is verging on being offensive.. Instead of looking at the issue from POV of "I don't need it" or "It requires more hardware resources", can't you try to evaluate the need on its own merit. And, if you still think that you will never need it, please remember that you dont have to --but others may. Bring Unicode "power" to the basic string type is overkill, any Unicode operation will be in the better case double time consumer, and some of them 40-50 times slower. A simple collation will take at least 4 times the memory needed by the string itself and for most sort algorithms needs the collation is unnecesary. So? What if it is a fact of life? Such as 24-bit graphics. We all know it takes a lot more resources and that only patsies need that much color; we ended up using it. Cn't you consider this unicode caharacter in the same light? (no pun). So think in a "new" user filling a TStringList with 1000 strings and invoking the Sort method, as the strings are Unicode they must be ordered using the locale collation or the general collation and finally saying "20 seconds to sort 1000 strings, this looks even worst than javascript". No. This is where you are mistaken, I' afraid. A TUnicodeStringList can contain strings from different collations and one 'locale' information will be useless in sorting out that mess. You need 'language' information in each of those strings to be able to properly sort that unicode list. Maybe, again from my point of view, it is more logical to create "TTextUnicodeChar" and "TTextUnicodeString" classes which handle Unicode textual data, not Unicode data. I can't see how you can do that. I can't see how we can cater for unicode data (not textual data, as you put it) in aything other than a specific class [or data type] PS: As one of the problems of Unicode support is the big amount of data that must be stored (in exe or external file) is there any recommended way to code, that unused arrays are left out when the function that uses that array is never been called in the main program ? Storage is a completely different problem. You could use, say, UTF-8 encoding and store also the language information when necessary. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt wrote: You are mixing 2 things. There is the actual string content, and there is the string metadata. The metadata is something that would apply for flyweight pattern. There is nothing to be gained by putting the metadata in an object, This is true --upto a point. And, that point arises when you wish to be able to work further with a TCharacter. Say, you're doing text processing --display and all. You would definitely like to be able to derive a new class from TCharacter and call it, say, TWPCharacter which contains all sorts of other properties, color, style, font, size etc. This would make life immensely easier for such jobs whereby a character may need to have more attributes than there exists in the base class. since there is only the encoding. Storing the encoding in an object is ridiculous and a waste of heap space. a 2 byte encoding is less wasteful than a 4 or 8 byte object pointer. I am afraid I do not agree with this at all. Or rather, it comes accross a very ANSI-centric view. You definitely need a 'language' attribute for a character. 'Locale' does not cut it simply because you can have mixed text i.e. portions that belong to a different language. Some weird characters in a my locale (say, Turkish) does not mecessarily mean that that piece of string is in another language --it may well be a transcription of /my/ name in a different character set (say, Greek). Yet, we all know that, (upper-, lower, title-) casing has nothing to do with the encoding; nor does collation order etc. In the above example, I used Turkish and Greek {what an unfortunate pairing, some might say :) } on purpose: Both of which follow their own case folding rules, as well as their own collation orders which are both dependent upon a language attribute/property. Without a language attribute, how would you handle these sorts of issues? Using a parallel byte array? Really? Wouldn't it be a lot more humane to us developers if the TCharacter had properties such as -- Language -- CollationOrder -- UpperCase -- LowerCase -- TittlerCase where, on setting the Language propery, all others get filled with their correct values and are read-only. Cheers, Adem ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Graeme Geldenhuys said: > > The problem is how it applies to strings, and how they can be more > > memory saving than a straight array of 16-bit values which are > > copy-on-write. > > I think for a good code example of this, have a look at Java's > Document class. It's not exactly what I'm talking about, but it's got > the idea. The Document class forms the basic storing medium of all > their text based components - from a simple TextEdit, TextArea to > complex rich text documents. So it scales well. How can you say that? The limit is if a person notices it, but a main string type must also be used for serversystems that import a several GB database export. > Each character can have individual characteristics set. Storage down > to character level. Similar to what I am suggesting with the Flyweight > pattern - characters of a string with encoding information. I can't see how you could stuff that in less than 16 bits? (since that would be the storage now, and you said it would save memory) > The Document class also uses an internal gapped buffer implementation > to store it's content - apparently good for performance. It is one of many ways to avoid big delays on big continuous documents. E.g. Word uses (classcally) a different approach, where the document is a set of references to paragraphs. That way you can swap entire paragraphs by manipulating a few pointers. It is also totally unrelated to stringhandling. > Again something like this could be used in the "character pool" manager > object - though I'm not 100% sure. Which, what, where, why character pool manager object? How > Please note, this is just a thought. I haven't written any Object > Pascal code implementing something like this - to prove the concept. I > simply know the Flyweight pattern and it seems to be a possible > option. And we are trying to get to the bottom of that feel. Let me summarize this ENTIRE discussion up to know (this also goes also for the other posters): 1a) objects -> good 1b) not object -> not good 2) flyweight pattern will be a good string type. 3) A "+" for string concatenation is frowned upon in good OOP circles. 4) The Java string type is an immutable object. 5) C++ _possibly_ has some problems effiectly coding s[x] using class string types. Which for practical relevance to the unicodestring type can be further summarized to the empty set. So in short: while I'm not entirely fond of an OOP approach to strings (simply because I have never seen one that fits in with a language as Delphi/FPC), I'm willing to hear the arguments. But we are now several tens of posts in this subthread, and there has been absolutely no information at all! > Remember, Unicode support is much more that simply storing and > displaying text Displaying text is already pretty much out of the scope of the unicodestring type that is the subject of this thread. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Graeme Geldenhuys schrieb: > Remember, Unicode support is much more that simply storing and > displaying text. You have various encodings, RTL or LTR direction etc. > I can't see how a simple type can keep track of all such information > - but then, I don't know the internals of FPC either. ;-) How would an OOP approach solve this? The problem isn't the tracking of things like encoding or directions but handling all these information. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Wed, 10 Sep 2008, Graeme Geldenhuys wrote: > On 9/10/08, Marco van de Voort <[EMAIL PROTECTED]> wrote: > > > > Like everybody, I have read GOF several times, and even got some of the > > successor books. > > I don't think anybody has read GOF only once. :-) > > > > The problem is how it applies to strings, and how they can be more > > memory saving than a straight array of 16-bit values which are > > copy-on-write. > > I think for a good code example of this, have a look at Java's > Document class. It's not exactly what I'm talking about, but it's got > the idea. The Document class forms the basic storing medium of all > their text based components - from a simple TextEdit, TextArea to > complex rich text documents. So it scales well. > > Each character can have individual characteristics set. Storage down > to character level. Similar to what I am suggesting with the Flyweight > pattern - characters of a string with encoding information. > > The Document class also uses an internal gapped buffer implementation > to store it's content - apparently good for performance. Again > something like this could be used in the "character pool" manager > object - though I'm not 100% sure. > > > Please note, this is just a thought. I haven't written any Object > Pascal code implementing something like this - to prove the concept. I > simply know the Flyweight pattern and it seems to be a possible > option. > > Remember, Unicode support is much more that simply storing and > displaying text. You have various encodings, RTL or LTR direction etc. > I can't see how a simple type can keep track of all such information > - but then, I don't know the internals of FPC either. ;-) You are mixing 2 things. There is the actual string content, and there is the string metadata. The metadata is something that would apply for flyweight pattern. There is nothing to be gained by putting the metadata in an object, since there is only the encoding. Storing the encoding in an object is ridiculous and a waste of heap space. a 2 byte encoding is less wasteful than a 4 or 8 byte object pointer. The main problem with the GOF book is that "If your only tool is a hammer, you tend to think of every problem as a nail." Objects are not the nec-plus-ultra of programming. They are useful in a very broad area, but not everything should be done in Objects, because they do give overhead. Strings are such a case where objects are simply too cumbersome. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On 9/10/08, Marco van de Voort <[EMAIL PROTECTED]> wrote: > > Like everybody, I have read GOF several times, and even got some of the > successor books. I don't think anybody has read GOF only once. :-) > The problem is how it applies to strings, and how they can be more > memory saving than a straight array of 16-bit values which are > copy-on-write. I think for a good code example of this, have a look at Java's Document class. It's not exactly what I'm talking about, but it's got the idea. The Document class forms the basic storing medium of all their text based components - from a simple TextEdit, TextArea to complex rich text documents. So it scales well. Each character can have individual characteristics set. Storage down to character level. Similar to what I am suggesting with the Flyweight pattern - characters of a string with encoding information. The Document class also uses an internal gapped buffer implementation to store it's content - apparently good for performance. Again something like this could be used in the "character pool" manager object - though I'm not 100% sure. Please note, this is just a thought. I haven't written any Object Pascal code implementing something like this - to prove the concept. I simply know the Flyweight pattern and it seems to be a possible option. Remember, Unicode support is much more that simply storing and displaying text. You have various encodings, RTL or LTR direction etc. I can't see how a simple type can keep track of all such information - but then, I don't know the internals of FPC either. ;-) Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Graeme Geldenhuys said: > > this ever save memory? > > Please read the following... > > http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm > > http://en.wikipedia.org/wiki/Flyweight_pattern > > Design Patterns - Elements of Reusable Object-Oriented Software > (aka GOF book) > "Most contemporary document editors don't use an object for every > character, presumably for efficiency reasons. Calder demonstrated that > this approach is feasible in his thesis [Cal93]. Calder's glyphs can > be shared to reduce storage costs, thereby forming directed-acyclic > graph structures. We can apply the Flyweight pattern to get the same > effect." ? A Case Study (chapter) > > [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight > Objects. PhD thesis, Stanford University, 1993. Like everybody, I have read GOF several times, and even got some of the successor books. The problem is how it applies to strings, and how they can be more memory saving than a straight array of 16-bit values which are copy-on-write. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>: > On 9/10/08, Micha Nelissen <[EMAIL PROTECTED]> wrote: > > > TCharacter and TString to be more intelligent with what encoding it > > > represents etc... And if you have an application with many strings, it > > > might actually save memory, because flyweight objects are used from a > > > pool. > > > > > > > Save memory? > > 1) storing information for each character > > 2) pool retains old classes I assume, so consumes unused memory; how can > > this ever save memory? > > Please read the following... > > http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm > > http://en.wikipedia.org/wiki/Flyweight_pattern > > Design Patterns - Elements of Reusable Object-Oriented Software > (aka GOF book) > "Most contemporary document editors don't use an object for every > character, presumably for efficiency reasons. Calder demonstrated that > this approach is feasible in his thesis [Cal93]. Calder's glyphs can > be shared to reduce storage costs, thereby forming directed-acyclic > graph structures. We can apply the Flyweight pattern to get the same > effect." â A Case Study (chapter) This is about glyphs, not values of characters. > [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight > Objects. PhD thesis, Stanford University, 1993. > > > Also related to your point (2). Reference counted objects can be used. > So "old" objects get freed automatically. The reference will need at least a UTF18 sized value, which for speed reason will probably result in 3 bytes. So for human readable texts the memory will be comparable to non class based unicode strings. It does not safe memory, but it does not cost more neither. But IMO it costs a lot of speed. This is not so important for text editors, where the glyphs, unicode, rtl, tabs, ... processing takes the biggest part of the time. For all other string algorithms I need the speed of arrays and base types. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On 9/10/08, Graeme Geldenhuys <[EMAIL PROTECTED]> wrote: > > Please read the following... > > http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm > > http://en.wikipedia.org/wiki/Flyweight_pattern > > Design Patterns - Elements of Reusable Object-Oriented Software > (aka GOF book) > "Most contemporary document editors don't use an object for every > character, presumably for efficiency reasons. Calder demonstrated that > this approach is feasible in his thesis [Cal93]. Calder's glyphs can > be shared to reduce storage costs, thereby forming directed-acyclic > graph structures. We can apply the Flyweight pattern to get the same > effect." ― A Case Study (chapter) > > [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight > Objects. PhD thesis, Stanford University, 1993. I forgot to add the reference to the Flyweight Pattern (falls under Structural Patterns) on page 195 in GOF book. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On 9/10/08, Micha Nelissen <[EMAIL PROTECTED]> wrote: > > TCharacter and TString to be more intelligent with what encoding it > > represents etc... And if you have an application with many strings, it > > might actually save memory, because flyweight objects are used from a > > pool. > > > > Save memory? > 1) storing information for each character > 2) pool retains old classes I assume, so consumes unused memory; how can > this ever save memory? Please read the following... http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm http://en.wikipedia.org/wiki/Flyweight_pattern Design Patterns - Elements of Reusable Object-Oriented Software (aka GOF book) "Most contemporary document editors don't use an object for every character, presumably for efficiency reasons. Calder demonstrated that this approach is feasible in his thesis [Cal93]. Calder's glyphs can be shared to reduce storage costs, thereby forming directed-acyclic graph structures. We can apply the Flyweight pattern to get the same effect." ― A Case Study (chapter) [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight Objects. PhD thesis, Stanford University, 1993. Also related to your point (2). Reference counted objects can be used. So "old" objects get freed automatically. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Graeme Geldenhuys wrote: TCharacter and TString to be more intelligent with what encoding it represents etc... And if you have an application with many strings, it might actually save memory, because flyweight objects are used from a pool. Save memory? 1) storing information for each character 2) pool retains old classes I assume, so consumes unused memory; how can this ever save memory? Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
I I fully agree with you. I would like the object oriented way of strings also - but I stopped asking for that ;) There are a lot of advantages over the small amount of disadvantages. Of course I dont like this one: S := TString.Create(''); But a built in class TString that is managed by the compiler. PS : Maybe i'm a littlebit more up to date about todays concepts of object oriented languages - maybe because I know him personally http://en.wikipedia.org/wiki/Bertrand_Meyer There were a lot of interesting discussions, etc... altough I dont like Eiffel :) and also this guy was one of my profs: http://en.wikipedia.org/wiki/Niklaus_Wirth greetings ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Jet another approach: var s:string; intStr:TInternalStringClass absolute s; TInternalStringClass(s).AsUTF8:='Some string'; writeln('String length=',intStr.length); TMyStingClass=class(TInternalStringClass) class function LoadFromResource(nResId:integer) end; intStr.LoadFromResourcse(nResId); ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Marco van de Voort schrieb: > In our previous episode, Ivo Steinmann said: >> I fully agree with you. I would like the object oriented way of strings >> also - but I stopped asking for that ;) There are a lot of advantages >> over the small amount of disadvantages. Of course I dont like this one: >> >> S := TString.Create(''); >> >> But a built in class TString that is managed by the compiler. >> >> PS : Maybe i'm a littlebit more up to date about todays concepts of >> object oriented languages - maybe because I know him personally >> http://en.wikipedia.org/wiki/Bertrand_Meyer >> There were a lot of interesting discussions, etc... altough I dont like >> Eiffel :) > > > I think it is less the object orientation but the possible customization > that is interesting. Did you ever see anybody using the variants customization :)? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Ivo Steinmann said: > I fully agree with you. I would like the object oriented way of strings > also - but I stopped asking for that ;) There are a lot of advantages > over the small amount of disadvantages. Of course I dont like this one: > > S := TString.Create(''); > > But a built in class TString that is managed by the compiler. > > PS : Maybe i'm a littlebit more up to date about todays concepts of > object oriented languages - maybe because I know him personally > http://en.wikipedia.org/wiki/Bertrand_Meyer > There were a lot of interesting discussions, etc... altough I dont like > Eiffel :) I think it is less the object orientation but the possible customization that is interesting. But I never liked an existing solution better than the solution we have now. And while I have not seen all languages, I've seen more than a few. Moreover what always strikes me (and apparantly Florian too, judging by his reaction) is the total lack of (detailed) arguments. All we have till now are Anton's two lines of pseudo code. If people are really interested in this, the least you can do is come with real evidence, comparisons, and not with just a few gratitious soundbites. If you learned so much from Meyer, write something about it. See also http://www.freepascal.org/faq.var#extensionselect Use the wiki for all I care, but do something, and be prepared to find solutions for criticism. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Wed, Sep 10, 2008 at 7:15 AM, listmember <[EMAIL PROTECTED]> wrote: > 1) since each character is a class, memory requirements are increased > several fold. > > 2) Again, the charater-as-class also means that the speed with wich we can > create and destroy (and manipulate) a string is a lot slower. I'm not saying that FPC needs to do any of this, I am simply commenting with my experience in OOP. Both the above issues could be addressed (or minimised) by using the Flyweight and Proxy design patterns. See the GOF (Gang of Four) book where they create an rich text editor. In your example, a TCharacter instance could be shared between other 'of the same" character in a single string (TString), or maybe the even have a global pool of TCharacters shared between many strings. When nobody (TString's) is referencing a TCharacter instance in the pool, it can be free'd. Just like reference counted objects. I don't know the internals of FPC, so I can't say if this is more or less work than the current implementations. But it does allow TCharacter and TString to be more intelligent with what encoding it represents etc... And if you have an application with many strings, it might actually save memory, because flyweight objects are used from a pool. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Graeme Geldenhuys wrote: I have to say I agree with you The Object Pascal / Delphi language already has way to many string types! At it's just getting worse. I've always liked the Java style of everything being an object - even the string type. The more I look at this Unicode issue, the more I believe we need a fundamental object aproach to it. I mean, before a TString class, we need a TCharacter class in which we need to specify --amongst other things-- what language that character belongs to. This kind of information is needed in order to properly manage the (upper-, lower-, title-, and camel-?) casing issues. On top of this, we also need this information in order to be able to mix and match and display the LTR (left-to-right) and RTL (right-to-left) pieces of strings within the same string. I have done some work on this, but there are at least 2 issues: 1) since each character is a class, memory requirements are increased several fold. 2) Again, the charater-as-class also means that the speed with wich we can create and destroy (and manipulate) a string is a lot slower. I am, at this point, wondering if FPC's object creation/destroy code could be more optimized to be faster to help with this issue. 3) How do you handle the character sets when characters are objects? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
peter green schreef: I have just checked the manual and I don't see anything I can use to make sure my custom type starts at a predictable state initially (nessacery so they assignment operator can safely clean up before making the assignment). Nor do I see anything to do automatic clean up when the variable goes out of scope. That's the point You don't have to! With the java system the string type is immutable anyway so there is no point in doing a deep copy. Which is imo - in the case of Java, but especially in the case of c++ - proven to be no at very smart idea. You want both and you want them recognizable by the compiler ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Jonas Maebe schrieb: On 09 Sep 2008, at 21:37, Florian Klaempfl wrote: Even C++'s is not good enough to do a ref. counted string in an efficient way. Just consider the [...] operator which needs to distinguish between reads and writes to avoid unncessary unique calls. Can't you have a const and non-const version of the [] operator in C++? I tried something similiar once with an older gcc but I didn't get it working. Maybe it's possible with newer ones. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Check again... I have just checked the manual and I don't see anything I can use to make sure my custom type starts at a predictable state initially (nessacery so they assignment operator can safely clean up before making the assignment). Nor do I see anything to do automatic clean up when the variable goes out of scope. But it is still a bad idea (like c++) How does one recognize a deep vs shallow string copy f.e. You don't have to! With the java system the string type is immutable anyway so there is no point in doing a deep copy. With the delphi/fpc system the string type automatically makes a shallow copy initially and then copies the actual data if and when it becomes nessacery to do so. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On 09 Sep 2008, at 21:37, Florian Klaempfl wrote: Even C++'s is not good enough to do a ref. counted string in an efficient way. Just consider the [...] operator which needs to distinguish between reads and writes to avoid unncessary unique calls. Can't you have a const and non-const version of the [] operator in C++? Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Sunday 07 September 2008 21.23:24 Florian Klaempfl wrote: Trunk 11723 does not compile: Trunk or unicodestring branch? Strange, because here it works? Unicodestring branch, sorry, I should change the directory name of my switched checkout. Does your unicodestring branch compile? Fixed in rev. 11734 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
peter green schrieb: 3: Use an automatic reference counting system either implemented in the compiler (the delphi/fpc way) or implemented using a very powerfull operator overloading system (the C++ way, last I checked freepascal did not have sufficiant operator overloading capabilities to implement this) Even C++'s is not good enough to do a ref. counted string in an efficient way. Just consider the [...] operator which needs to distinguish between reads and writes to avoid unncessary unique calls. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
peter green schreef: I fully agree with you. I would like the object oriented way of strings also - but I stopped asking for that ;) There are a lot of advantages over the small amount of disadvantages. Which object orientated way of doing strings? As I see it there are three main ways of doing variable length strings. 1: Let the programmer manage the memory lifetime (the C way), this is tedious, error prone and generally results in lots of unnessacery copying of strings since it is easier for the programmer to have seperate copies owned by different objects than to manage shared strings. 2: Use immutable objects and let the garbage collector clean them up (the java way), this works but since the strings are immutable they must be copied to make any modification. It also relies on a garbage collector will all it's associated problems. 3: Use an automatic reference counting system either implemented in the compiler (the delphi/fpc way) or implemented using a very powerfull operator overloading system (the C++ way, last I checked freepascal did not have sufficiant operator overloading capabilities to implement this) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.6.19/1660 - Release Date: 9/8/2008 6:39 PM Check again... But it is still a bad idea (like c++) How does one recognize a deep vs shallow string copy f.e. This is realy basic. And rather uninformed as well.. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
I fully agree with you. I would like the object oriented way of strings also - but I stopped asking for that ;) There are a lot of advantages over the small amount of disadvantages. Which object orientated way of doing strings? As I see it there are three main ways of doing variable length strings. 1: Let the programmer manage the memory lifetime (the C way), this is tedious, error prone and generally results in lots of unnessacery copying of strings since it is easier for the programmer to have seperate copies owned by different objects than to manage shared strings. 2: Use immutable objects and let the garbage collector clean them up (the java way), this works but since the strings are immutable they must be copied to make any modification. It also relies on a garbage collector will all it's associated problems. 3: Use an automatic reference counting system either implemented in the compiler (the delphi/fpc way) or implemented using a very powerfull operator overloading system (the C++ way, last I checked freepascal did not have sufficiant operator overloading capabilities to implement this) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Ivo Steinmann schrieb: I fully agree with you. I would like the object oriented way of strings also - but I stopped asking for that ;) There are a lot of advantages Which ones? Really, I want to know :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Anton Kavalenka schrieb: > Florian Klaempfl wrote: >> I've continued to work on support of an unicodestring type in fpc. >> It's currently in an svn branch at: >> http://svn.freepascal.org/svn/fpc/branches/unicodestring >> and will be merged later to trunk. The unicodestring type is a ref. >> counted utf-16 string. On non-windows, widestring is mapped to this >> type. If you're interested in unicode support please test, give >> feedback here and submit fixes. >> >> An existing working copy of trunk can be switched to this branch by >> cd fpc >> svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring >> and back with >> svn switch http://svn.freepascal.org/svn/fpc/trunk >> ___ >> fpc-devel maillist - fpc-devel@lists.freepascal.org >> http://lists.freepascal.org/mailman/listinfo/fpc-devel >> > The Pascal huge strings always annoy me. > Since - it is IMPLICIT automatic object with set of overloaded > methods, length and reference count fields etc hidden from developer. > > In near future we geat a Zoo of the strings: > AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar > Some of them with encoding field. > > Why not to make it EXPLICIT object > > s:=TCoolFPCString.Create('Test'); > s2:=TCoolFPCString.Create('Проверка'); //UTF8 encoded constant > s.asUtf8+=s2; > > SetWindowTextW(WinHandle,s.AsUnicodeString); // i explicitly say - get > me wide string and DO not any compiler magic > > if (s1.length=length(s2))... // generic runtime function length > returns the property of cool object > > s1.AcquireLock // prevent other threads acccess > s1.Clear; > s1.LoadFromResource(n_ReasourceId); // just use GNU gettext > s1.LoadTranslationFromResource(n_resID,'be_BY'); > s1.ReleaseLock // allow other thread access > > Anyway I just can subclass standard string and get a new functionality > with reachness of availabel fields and methods. > > > FPC supports operators - so there is lots of way to represent the > string, assign the string, load it from resource. > Make it thread-safe at implementation level but not at compiler level. > Standard string, unicode string , ansistring, widestring can be > implemented as wrappers along this object. > It seems like in mseGUI it is done. > > > > > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/mailman/listinfo/fpc-devel > I fully agree with you. I would like the object oriented way of strings also - but I stopped asking for that ;) There are a lot of advantages over the small amount of disadvantages. Of course I dont like this one: S := TString.Create(''); But a built in class TString that is managed by the compiler. PS : Maybe i'm a littlebit more up to date about todays concepts of object oriented languages - maybe because I know him personally http://en.wikipedia.org/wiki/Bertrand_Meyer There were a lot of interesting discussions, etc... altough I dont like Eiffel :) and also this guy was one of my profs: http://en.wikipedia.org/wiki/Niklaus_Wirth greetings ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Anton Kavalenka wrote: I only have a dream - controllable way of string assignment without any magic like implicit call of _LStrAddRefCnt Do you have a real-world sample of usage, ie, where or when the object pascal way is a problem? Joao Morais ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt wrote: On Tue, 9 Sep 2008, Anton Kavalenka wrote: Nothing stops you from doing this yourself. But for something as basic as text operations, I think this is bloat. Imagine that you would have to do I:=TInteger.Create(1); J:=TInteger.Create(2); I.Add(J); What kind of language do you end up with then ? Utterly unreadable, and slow, because heavily relying on the heap. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Bad example Numbers are scalars Strings are vectors += operator in not so straightforward as for numbers. bad example for you, but not for me: Handling strings should be as easy as handling integers. Who else except Pascal developers knows that s:=s1+s2 is the string concatenation and invokes lot of hidden stuff that is out of control. This is the beauty of pascal: you don't need to know, and there should be no need. I once asked a C++ programmer how to read a file full of strings. After 2 hourse he came to tell me he didn't know. In Pascal, it takes about 1 minute to code, because strings are a basic type, handled on the stack. And rightly so. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel :-) This not a holy war C++ vs Pascal If C++ programmer don't know about fstream descendants - send him back to school (or actually (he|she) is VB programmer). I only have a dream - controllable way of string assignment without any magic like implicit call of _LStrAddRefCnt ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Tue, Sep 9, 2008 at 2:23 PM, Graeme Geldenhuys <[EMAIL PROTECTED]> wrote: > On 9/9/08, Anton Kavalenka <[EMAIL PROTECTED]> wrote: >> The Pascal huge strings always annoy me. >> Since - it is IMPLICIT automatic object with set of overloaded methods, >> length and reference count fields etc hidden from developer. >> >> In near future we geat a Zoo of the strings: >> AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar >> Some of them with encoding field. > > I have to say I agree with you The Object Pascal / Delphi language > already has way to many string types! At it's just getting worse. Actually I find this to be a good feature. On C for example you will find a lot of typedef that results out of int or long int, and you can understand that time_f is about working with time, while pid_t talks about pids etc... They all are integer types but it is easier to understand their uses. Sure it means that you must have better documentation out there, but I think it is worth it. > > I've always liked the Java style of everything being an object - even > the string type. It is always the thing I dislike in Java. For example on languages such as Ruby/Python everything is a true object (including nil in ruby), however you do not "need" it when you do not use sub methods, and there for your language like Java and C++ become a bloat ware. Because it have way too much information to compile into binary. On Pascal (using smart linking) you can add only things you use (but on OO it does not work like that). > > > Regards, > - Graeme - > > > ___ > fpGUI - a cross-platform Free Pascal GUI toolkit > http://opensoft.homeip.net/fpgui/ > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/mailman/listinfo/fpc-devel > Ido -- http://ik.homelinux.org/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Tue, 9 Sep 2008, Anton Kavalenka wrote: > > > Nothing stops you from doing this yourself. > > > > But for something as basic as text operations, I think this is bloat. > > > > Imagine that you would have to do > > I:=TInteger.Create(1); > > J:=TInteger.Create(2); > > I.Add(J); > > What kind of language do you end up with then ? Utterly unreadable, and > > slow, because heavily relying on the heap. > > > > Michael. > > ___ > > fpc-devel maillist - fpc-devel@lists.freepascal.org > > http://lists.freepascal.org/mailman/listinfo/fpc-devel > > > > > Bad example > Numbers are scalars > Strings are vectors > += operator in not so straightforward as for numbers. bad example for you, but not for me: Handling strings should be as easy as handling integers. > > Who else except Pascal developers knows that s:=s1+s2 is the string > concatenation and invokes lot of hidden stuff that is out of control. This is the beauty of pascal: you don't need to know, and there should be no need. I once asked a C++ programmer how to read a file full of strings. After 2 hourse he came to tell me he didn't know. In Pascal, it takes about 1 minute to code, because strings are a basic type, handled on the stack. And rightly so. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Nothing stops you from doing this yourself. But for something as basic as text operations, I think this is bloat. Imagine that you would have to do I:=TInteger.Create(1); J:=TInteger.Create(2); I.Add(J); What kind of language do you end up with then ? Utterly unreadable, and slow, because heavily relying on the heap. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Bad example Numbers are scalars Strings are vectors += operator in not so straightforward as for numbers. Who else except Pascal developers knows that s:=s1+s2 is the string concatenation and invokes lot of hidden stuff that is out of control. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Graeme Geldenhuys said: > > The Pascal huge strings always annoy me. Since - it is IMPLICIT > > automatic object with set of overloaded methods, > > length and reference count fields etc hidden from developer. > > > > In near future we geat a Zoo of the strings: > > AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar > > Some of them with encoding field. > > I have to say I agree with you The Object Pascal / Delphi language > already has way to many string types! At it's just getting worse. Well, then only use one? What is the problem? As soon as the RTL is unicodestring enabled, throw away anything that is not unicode, create everything new in unicode, and be done with it. Legacy always causes ballast. > I've always liked the Java style of everything being an object - even > the string type. It creates a lot of troubles (very visible in Java with its need for stringbuilder), but it is not exactly clear what it solves. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Tue, 9 Sep 2008, Anton Kavalenka wrote: > Florian Klaempfl wrote: > > I've continued to work on support of an unicodestring type in fpc. It's > > currently in an svn branch at: > > http://svn.freepascal.org/svn/fpc/branches/unicodestring > > and will be merged later to trunk. The unicodestring type is a ref. counted > > utf-16 string. On non-windows, widestring is mapped to this type. If you're > > interested in unicode support please test, give feedback here and submit > > fixes. > > > > An existing working copy of trunk can be switched to this branch by > > cd fpc > > svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring > > and back with > > svn switch http://svn.freepascal.org/svn/fpc/trunk > > ___ > > fpc-devel maillist - fpc-devel@lists.freepascal.org > > http://lists.freepascal.org/mailman/listinfo/fpc-devel > > > The Pascal huge strings always annoy me. > Since - it is IMPLICIT automatic object with set of overloaded methods, length > and reference count fields etc hidden from developer. > > In near future we geat a Zoo of the strings: > AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar > Some of them with encoding field. > > Why not to make it EXPLICIT object > > s:=TCoolFPCString.Create('Test'); Nothing stops you from doing this yourself. But for something as basic as text operations, I think this is bloat. Imagine that you would have to do I:=TInteger.Create(1); J:=TInteger.Create(2); I.Add(J); What kind of language do you end up with then ? Utterly unreadable, and slow, because heavily relying on the heap. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On 9/9/08, Anton Kavalenka <[EMAIL PROTECTED]> wrote: > The Pascal huge strings always annoy me. > Since - it is IMPLICIT automatic object with set of overloaded methods, > length and reference count fields etc hidden from developer. > > In near future we geat a Zoo of the strings: > AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar > Some of them with encoding field. I have to say I agree with you The Object Pascal / Delphi language already has way to many string types! At it's just getting worse. I've always liked the Java style of everything being an object - even the string type. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Florian Klaempfl wrote: I've continued to work on support of an unicodestring type in fpc. It's currently in an svn branch at: http://svn.freepascal.org/svn/fpc/branches/unicodestring and will be merged later to trunk. The unicodestring type is a ref. counted utf-16 string. On non-windows, widestring is mapped to this type. If you're interested in unicode support please test, give feedback here and submit fixes. An existing working copy of trunk can be switched to this branch by cd fpc svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring and back with svn switch http://svn.freepascal.org/svn/fpc/trunk ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel The Pascal huge strings always annoy me. Since - it is IMPLICIT automatic object with set of overloaded methods, length and reference count fields etc hidden from developer. In near future we geat a Zoo of the strings: AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar Some of them with encoding field. Why not to make it EXPLICIT object s:=TCoolFPCString.Create('Test'); s2:=TCoolFPCString.Create(''); //UTF8 encoded constant s.asUtf8+=s2; SetWindowTextW(WinHandle,s.AsUnicodeString); // i explicitly say - get me wide string and DO not any compiler magic if (s1.length=length(s2))... // generic runtime function length returns the property of cool object s1.AcquireLock // prevent other threads acccess s1.Clear; s1.LoadFromResource(n_ReasourceId); // just use GNU gettext s1.LoadTranslationFromResource(n_resID,'be_BY'); s1.ReleaseLock // allow other thread access Anyway I just can subclass standard string and get a new functionality with reachness of availabel fields and methods. FPC supports operators - so there is lots of way to represent the string, assign the string, load it from resource. Make it thread-safe at implementation level but not at compiler level. Standard string, unicode string , ansistring, widestring can be implemented as wrappers along this object. It seems like in mseGUI it is done. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Sunday 07 September 2008 21.23:24 Florian Klaempfl wrote: > > > > Trunk 11723 does not compile: > > Trunk or unicodestring branch? Strange, because here it works? Unicodestring branch, sorry, I should change the directory name of my switched checkout. Does your unicodestring branch compile? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Sunday 07 September 2008 10.58:03 Florian Klaempfl wrote: Martin Schreiber schrieb: On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote: Martin Schreiber schrieb: Next problem is that pmsechar(msestring) returns a NIL pointer if msestring = ''. As designed? The behaviour of ansistring and widestring was very useful, I'd like if UnicodeString would behave the same. Do you have some example code which shows this? Fixed. Trunk 11723 does not compile: Trunk or unicodestring branch? Strange, because here it works? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Sunday 07 September 2008 10.58:03 Florian Klaempfl wrote: > Martin Schreiber schrieb: > > On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote: > >> Martin Schreiber schrieb: > >>> Next problem is that pmsechar(msestring) returns a NIL pointer if > >>> msestring = ''. As designed? The behaviour of ansistring and widestring > >>> was very useful, I'd like if UnicodeString would behave the same. > >> > >> Do you have some example code which shows this? > > Fixed. Trunk 11723 does not compile: " make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32' C:/FPC/2.2.2/bin/i386-Win32/gmkdir.exe -p E:/FPC/svn/trunk/rtl/units/i386-win32 C:/FPC/2.2.2/bin/i386-Win32/ppc386.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi.. /win -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg system.pp -Fi../win wustring22.inc(699,27) Fatal: Unknown compilerproc "fpc_char_to_wchar". Check if you use the correct run time library. Fatal: Compilation aborted make[7]: *** [system.ppu] Fehler 1 make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32' make[6]: *** [win32_all] Fehler 2 make[6]: Leaving directory `E:/FPC/svn/trunk/rtl' make[5]: *** [rtl] Fehler 2 make[5]: Leaving directory `E:/FPC/svn/trunk/compiler' make[4]: *** [next] Fehler 2 make[4]: Leaving directory `E:/FPC/svn/trunk/compiler' make[3]: *** [ppc1.exe] Fehler 2 make[3]: Leaving directory `E:/FPC/svn/trunk/compiler' make[2]: *** [cycle] Fehler 2 make[2]: Leaving directory `E:/FPC/svn/trunk/compiler' make[1]: *** [compiler_cycle] Fehler 2 make[1]: Leaving directory `E:/FPC/svn/trunk' make: *** [build-stamp.i386-win32] Fehler 2 " ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote: Martin Schreiber schrieb: Next problem is that pmsechar(msestring) returns a NIL pointer if msestring = ''. As designed? The behaviour of ansistring and widestring was very useful, I'd like if UnicodeString would behave the same. Do you have some example code which shows this? Fixed. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote: > Martin Schreiber schrieb: > > > Next problem is that pmsechar(msestring) returns a NIL pointer if > > msestring = ''. As designed? The behaviour of ansistring and widestring > > was very useful, I'd like if UnicodeString would behave the same. > > Do you have some example code which shows this? See attachment. Test result: " F:\proj\testcase\fpc\unicode\punicodechar>punicodechartest.exe 4288048 4288048 0 0 0 An unhandled exception occurred at $004016C5 : EAccessViolation : Access violation $004016C5 main, line 25 of punicodechartest.pas " Martin program punicodechartest; {$ifdef FPC}{$mode objfpc}{$h+}{$endif} {$ifdef mswindows}{$apptype console}{$endif} uses {$ifdef FPC}{$ifdef linux}cthreads,{$endif}{$endif} sysutils; var astr: ansistring; wstr: widestring; ustr: unicodestring; begin astr:= ''; wstr:= ''; ustr:= ''; writeln(ptrint(pansichar(astr))); flush(output); writeln(ptrint(pwidechar(wstr))); flush(output); writeln(ptrint(punicodechar(ustr))); flush(output); writeln(ord(pansichar(astr)^)); flush(output); writeln(ord(pwidechar(wstr)^)); flush(output); writeln(ord(punicodechar(ustr)^)); flush(output); end. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote: [...] This should be fixed. Thanks, FPC and MSEide compile now. Attached an "emergency" patch that I could load the MSEgui forms, not finished and not tested. Thanks. Is TTypekind = (... tkInterfaceRaw,tkUChar,tkUString) correct? Almost, slightly modified patch is applied. Next problem is that pmsechar(msestring) returns a NIL pointer if msestring = ''. As designed? The behaviour of ansistring and widestring was very useful, I'd like if UnicodeString would behave the same. Do you have some example code which shows this? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote: [...] > > This should be fixed. > > Thanks, FPC and MSEide compile now. Attached an "emergency" patch that I could load the MSEgui forms, not finished and not tested. Is TTypekind = (... tkInterfaceRaw,tkUChar,tkUString) correct? Next problem is that pmsechar(msestring) returns a NIL pointer if msestring = ''. As designed? The behaviour of ansistring and widestring was very useful, I'd like if UnicodeString would behave the same. Thanks, Martin Index: rtl/objpas/classes/classesh.inc === --- rtl/objpas/classes/classesh.inc (revision 11713) +++ rtl/objpas/classes/classesh.inc (working copy) @@ -899,7 +899,8 @@ TValueType = (vaNull, vaList, vaInt8, vaInt16, vaInt32, vaExtended, vaString, vaIdent, vaFalse, vaTrue, vaBinary, vaSet, vaLString, -vaNil, vaCollection, vaSingle, vaCurrency, vaDate, vaWString, vaInt64, vaUTF8String); +vaNil, vaCollection, vaSingle, vaCurrency, vaDate, vaWString, vaInt64, +vaUTF8String,vaUString); TFilerFlag = (ffInherited, ffChildPos, ffInline); TFilerFlags = set of TFilerFlag; @@ -965,6 +966,7 @@ function ReadStr: String; virtual; abstract; function ReadString(StringType: TValueType): String; virtual; abstract; function ReadWideString: WideString;virtual;abstract; +function ReadUnicodeString: UnicodeString;virtual;abstract; procedure SkipComponent(SkipComponentInfos: Boolean); virtual; abstract; procedure SkipValue; virtual; abstract; end; @@ -1016,6 +1018,7 @@ function ReadStr: String; override; function ReadString(StringType: TValueType): String; override; function ReadWideString: WideString;override; +function ReadUnicodeString: UnicodeString;override; procedure SkipComponent(SkipComponentInfos: Boolean); override; procedure SkipValue; override; end; @@ -1101,6 +1104,7 @@ function ReadBoolean: Boolean; function ReadChar: Char; function ReadWideChar: WideChar; +function ReadUnicodeChar: UnicodeChar; procedure ReadCollection(Collection: TCollection); function ReadComponent(Component: TComponent): TComponent; procedure ReadComponents(AOwner, AParent: TComponent; @@ -1119,6 +1123,7 @@ function ReadRootComponent(ARoot: TComponent): TComponent; function ReadString: string; function ReadWideString: WideString; +function ReadUnicodeString: UnicodeString; function ReadValue: TValueType; procedure CopyValue(Writer: TWriter); property Driver: TAbstractObjectReader read FDriver; @@ -1170,6 +1175,7 @@ procedure WriteSet(Value: LongInt; SetType: Pointer); virtual; abstract; procedure WriteString(const Value: String); virtual; abstract; procedure WriteWideString(const Value: WideString);virtual;abstract; +procedure WriteUnicodeString(const Value: UnicodeString);virtual;abstract; end; { TBinaryObjectWriter } @@ -1220,6 +1226,7 @@ procedure WriteSet(Value: LongInt; SetType: Pointer); override; procedure WriteString(const Value: String); override; procedure WriteWideString(const Value: WideString); override; +procedure WriteUnicodeString(const Value: UnicodeString); override; end; TTextObjectWriter = class(TAbstractObjectWriter) Index: rtl/objpas/classes/reader.inc === --- rtl/objpas/classes/reader.inc (revision 11713) +++ rtl/objpas/classes/reader.inc (working copy) @@ -339,6 +339,25 @@ end; end; +function TBinaryObjectReader.ReadUnicodeString: UnicodeString; +var + len: DWord; +{$IFDEF ENDIAN_BIG} + i : integer; +{$ENDIF} +begin + len := ReadDWord; + SetLength(Result, len); + if (len > 0) then + begin +Read(Pointer(@Result[1])^, len*2); +{$IFDEF ENDIAN_BIG} +for i:=1 to len do + Result[i]:=UnicodeChar(SwapEndian(word(Result[i]))); +{$ENDIF} + end; +end; + procedure TBinaryObjectReader.SkipComponent(SkipComponentInfos: Boolean); var Flags: TFilerFlags; @@ -749,6 +768,19 @@ raise EReadError.Create(SInvalidPropertyValue); end; +function TReader.ReadUnicodeChar: UnicodeChar; + +var + U: UnicodeString; + +begin + U := ReadUnicodeString; + if Length(U) = 1 then +Result := U[1] + else +raise EReadError.Create(SInvalidPropertyValue); +end; + procedure TReader.ReadCollection(Collection: TCollection); var Item: TCollectionItem; @@ -1172,7 +1204,7 @@ SetOrdProp(Instance, PropInfo, Ord(ReadBoolean)); tkChar: SetOrdProp(Instance, PropInfo, Ord(ReadChar)); -tkWChar: +tkWChar,tkUChar: SetOrdProp(Instance, PropInfo, Ord(ReadWideChar)); tkEnumeration: begin @@ -1217,6 +1249,8 @@ FOnReadStringProperty(Self,Instance,PropInfo,TmpStr); SetStrProp(Instance, PropInfo, TmpStr); end; +tkUstring: + SetUnicodeStrProp(Instance,PropInfo,ReadUnicodeString
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote: > > If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has > > msestring = unicodestring if compiled with -dmse_unicodestring. > > What's the official way to compile MSE? > cd apps\ide ppc386.exe -Fu..\..\lib\common\* -Fi..\..\lib\common\kernel -Fu..\..\lib\common\kernel\i386-win32 mseide.pas or open apps\ide\mseide.prj in MSEide with 'Project'-'Open', 'Project'-'Make'. In order to test UnicodeString the commadline is: ppc386.exe -dmse_unicodestring -Fu..\..\lib\common\* -Fi..\..\lib\common\kernel -Fu..\..\lib\common\kernel\i386-win32 mseide.pas If you want to debug the compiler with MSEide add the compiler source directories to 'Project'-'Options'-'Debugger'-'Source directories'. From an older post of this list: " This is for MSEide i386 and FPC 2.3.1: http://sourceforge.net/project/showfiles.php?group_id=165409 - 'Project'-'New'-'From Program'. - Select "compiler/pp.pas" from your FPC SVN checkout. - Accept "pp.prj". - 'Project'-'Options'-'Make'-'Make options', add "-di386" (without quotes) to the first row of 'Command line options'. - 'Project'-'Options'-'Make'-'Directories', add a row, select "compiler/i386/" - Add a row with "/compiler/x86/". - Add a row with "/compiler/systems/". - Set the commandline parameters for the target in 'Target'-'Environment'. - Press F9. You should possibly change the unit output directory to be consistent with the make file. " Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Schreiber schrieb: Florian, On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: I've continued to work on support of an unicodestring type in fpc. It's currently in an svn branch at: http://svn.freepascal.org/svn/fpc/branches/unicodestring and will be merged later to trunk. The unicodestring type is a ref. counted utf-16 string. On non-windows, widestring is mapped to this type. If you're interested in unicode support please test, give feedback here and submit fixes. I tried the unicode branch on Windows, rev. 11711 does not compile: make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32' E:/FPC/svn/trunk/compiler/ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi../win -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg system.pp -Fi../win wstrings.inc(1655,60) Error: Identifier not found "CharLengthPChar" ustrings.inc(2147,42) Error: Incompatible types: got "procedure(PCha r,var UnicodeString, LongInt);Register>" expected "procedure(PChar,var WideString, LongInt);Register>" ustrings.inc(2148,44) Error: Incompatible types: got "function(const UnicodeString):UnicodeString;Register>" expected "function(const WideString):WideString;Register>" ustrings.inc(2149,44) Error: Incompatible types: got "function(const UnicodeString):UnicodeString;Register>" expected "function(const WideString):WideString;Register>" ustrings.inc(2151,46) Error: Incompatible types: got "function(const UnicodeString,const UnicodeString):LongInt;Register>" expected "variable type of function(const WideString,const WideString):LongInt;Register>" ustrings.inc(2152,50) Error: Incompatible types: got "function(const UnicodeString,const UnicodeString):LongInt;Register>" expected "variable type of function(const WideString,const WideString):LongInt;Register>" system.pp(1253) Fatal: There were 6 errors compiling module, stopping Fatal: Compilation aborted make[7]: *** [system.ppu] Fehler 1 make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32' make[6]: *** [win32_all] Fehler 2 make[6]: Leaving directory `E:/FPC/svn/trunk/rtl' make[5]: *** [rtl] Fehler 2 make[5]: Leaving directory `E:/FPC/svn/trunk/compiler' make[4]: *** [next] Fehler 2 make[4]: Leaving directory `E:/FPC/svn/trunk/compiler' make[3]: *** [ppc2.exe] Fehler 2 make[3]: Leaving directory `E:/FPC/svn/trunk/compiler' make[2]: *** [cycle] Fehler 2 make[2]: Leaving directory `E:/FPC/svn/trunk/compiler' make[1]: *** [compiler_cycle] Fehler 2 make[1]: Leaving directory `E:/FPC/svn/trunk' make: *** [build-stamp.i386-win32] Fehler 2 This should be fixed. Compiling MSEide with rev. 11667 I get: Free Pascal Compiler version 2.3.1 [2008/09/05] for i386 Copyright (c) 1993-2008 by Florian Klaempfl Target OS: Win32 for i386 Compiling mseide.pas [...] msestream.pas(762,2) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msestream.pas(785,33) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msestream.pas(810,34) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msesysintf.pas(1552,21) Fatal: Unknown compilerproc "fpc_widechararray_to_unicodestr". Check if you use the correct run time library. Fatal: Compilation aborted If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has msestring = unicodestring if compiled with -dmse_unicodestring. What's the official way to compile MSE? I found no UnicodeString support in typeinfo and variants? Indeed, this must be added. What are the plans for Unicode resourcestrigs? Not decided yet. TField should probably have an asUnicodeString property too. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Florian, On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote: > I've continued to work on support of an unicodestring type in fpc. It's > currently in an svn branch at: > http://svn.freepascal.org/svn/fpc/branches/unicodestring > and will be merged later to trunk. The unicodestring type is a ref. > counted utf-16 string. On non-windows, widestring is mapped to this > type. If you're interested in unicode support please test, give feedback > here and submit fixes. > I tried the unicode branch on Windows, rev. 11711 does not compile: make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32' E:/FPC/svn/trunk/compiler/ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi../win -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg system.pp -Fi../win wstrings.inc(1655,60) Error: Identifier not found "CharLengthPChar" ustrings.inc(2147,42) Error: Incompatible types: got "" expected "" ustrings.inc(2148,44) Error: Incompatible types: got "" expected "" ustrings.inc(2149,44) Error: Incompatible types: got "" expected "" ustrings.inc(2151,46) Error: Incompatible types: got "" expected "" ustrings.inc(2152,50) Error: Incompatible types: got "" expected "" system.pp(1253) Fatal: There were 6 errors compiling module, stopping Fatal: Compilation aborted make[7]: *** [system.ppu] Fehler 1 make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32' make[6]: *** [win32_all] Fehler 2 make[6]: Leaving directory `E:/FPC/svn/trunk/rtl' make[5]: *** [rtl] Fehler 2 make[5]: Leaving directory `E:/FPC/svn/trunk/compiler' make[4]: *** [next] Fehler 2 make[4]: Leaving directory `E:/FPC/svn/trunk/compiler' make[3]: *** [ppc2.exe] Fehler 2 make[3]: Leaving directory `E:/FPC/svn/trunk/compiler' make[2]: *** [cycle] Fehler 2 make[2]: Leaving directory `E:/FPC/svn/trunk/compiler' make[1]: *** [compiler_cycle] Fehler 2 make[1]: Leaving directory `E:/FPC/svn/trunk' make: *** [build-stamp.i386-win32] Fehler 2 Compiling MSEide with rev. 11667 I get: Free Pascal Compiler version 2.3.1 [2008/09/05] for i386 Copyright (c) 1993-2008 by Florian Klaempfl Target OS: Win32 for i386 Compiling mseide.pas [...] msestream.pas(762,2) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msestream.pas(785,33) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msestream.pas(810,34) Warning: Class types "tmsefilestream" and "THandleStreamcracker" are not related msesysintf.pas(1552,21) Fatal: Unknown compilerproc "fpc_widechararray_to_unicodestr". Check if you use the correct run time library. Fatal: Compilation aborted If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has msestring = unicodestring if compiled with -dmse_unicodestring. I found no UnicodeString support in typeinfo and variants? What are the plans for Unicode resourcestrigs? TField should probably have an asUnicodeString property too. Thank you very much for your work. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Marc Weustink said: > OK, then we name it objects (or records with methods) > > > Before you know it you are messing with special stringbuilder classes and > > special syntax to keep a semblance of performance. Moreover I don't really > > see what this solves. > > It solves the case that you want to have records/objects with non > standard inatialisation/finalisation code. Refcounted assignments etc. Yes, it allows more leeway for DIY. But I let the fact the common situation gets messier prevail over that. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Marco van de Voort wrote: In our previous episode, Ivo Steinmann said: fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Why not creating a new kind of managed class, that is refcounted, initialized, finalized, etc... like String type? I never liked string-types as classes. They feel like cheap imitations of a real string type. OK, then we name it objects (or records with methods) Before you know it you are messing with special stringbuilder classes and special syntax to keep a semblance of performance. Moreover I don't really see what this solves. It solves the case that you want to have records/objects with non standard inatialisation/finalisation code. Refcounted assignments etc. If this were possible, I would have used in for the utf enconding of strings and for the wincontrol.handle/reference in lazarus. Marc ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Ivo Steinmann said: > > fpc-devel maillist - fpc-devel@lists.freepascal.org > > http://lists.freepascal.org/mailman/listinfo/fpc-devel > > > > > Why not creating a new kind of managed class, that is refcounted, > initialized, finalized, etc... like String type? I never liked string-types as classes. They feel like cheap imitations of a real string type. Before you know it you are messing with special stringbuilder classes and special syntax to keep a semblance of performance. Moreover I don't really see what this solves. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Marco van de Voort schrieb: In our previous episode, Luiz Americo Pereira Camara said: And use TNativeString for encoding agnostic purposes. Well, really agnostic code should simply use "string" :) Delphi is introducing the RawByteString type, that skips the auto encoding conversion. I don't know where it fits in the upcoming unicode schema. Anyway there's an example how to use it: http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/ That's the lowlevel agnostic way. I'm talking more for purposes like classes libraries, that will want to use a native type on both conventions, but will generally operate on the strings on a relatively high level. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Why not creating a new kind of managed class, that is refcounted, initialized, finalized, etc... like String type? then create a string type on this managed class. So String is going to be a class. For for unicode string you create a descendant of this class with unicode implementations. This way it's still compatible to the baseclass String. The managed class should follow these rules: EXAMPLE --- function Param(S: TManagedClass); function VarParam(var S: TManagedClass); function OutParam(out S: TManagedClass); function ConstParam(const S: TManagedClass); var S, T: TManagedClass; begin {c} S := nil; // compiler code {c} T := nil; S := 'abcd'; {c} tmp := TManagedClass.Create('abcd'); {c} S := S.Assign(tmp); // Assign is a function that takes another managed class to assign and returns a new instance or reference. (self can be nil) {c} tmp.Release; Param(S); {c} S.NewRef; // NewRef is a function that incrase the {c} Param(S); {c} S.Release; VarParam(S); {c} VarParam(S); OutParam(S); {c} S.Release; {c} S := nil; {c} OutParam(S); ConstParam(S); {c} ConstParam(S); T := S; {c} T := T.Assign(S); T := S + 'abcd' + S; {c} eg. TManagedClass.Append function {c} S.Release; {c} T.Release; end; END - I'm aware that a lot of these things could also implemented as overloaded operators. But NOT the initialization, finalization and parameter handling part. Also Assign maybe a problem, because I need somehting like that: operator := (var Dest: TManagedClass; Src: TManagedClass); In case of assign, the destination pointer changes. but before I can change the destination pointer, I have to release the reference. and that's not possible with: operator := (Src: TManagedClass) Dest: TManagedClass; -Ivo ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
In our previous episode, Luiz Americo Pereira Camara said: > >> > >> And use TNativeString for encoding agnostic purposes. > > > > Well, really agnostic code should simply use "string" :) > > Delphi is introducing the RawByteString type, that skips the auto > encoding conversion. I don't know where it fits in the upcoming unicode > schema. Anyway there's an example how to use it: > http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/ That's the lowlevel agnostic way. I'm talking more for purposes like classes libraries, that will want to use a native type on both conventions, but will generally operate on the strings on a relatively high level. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Hello ;) Im trying to test your new string type now :) but after switching to new branch, I couldnt compile fpc ^^ BTW: A year ago, I wrote a complete Unicode string library for a new possible built in string type. My Idea was to create a new built in type with an additional flag. This flag stored the encoding of the string. By default (if the flag is not explicitly set by the user) it was the current system charmap. I also wrote functions that could encode from any to any charmap (utf8, utf16, ucs2, ucs4, iso8859, codepages, ascii, etc). if the user concated two strings, one encoded in ascii, and one in utf8, the resulting string was utf8; S1: Unistring; S2: Unistring; S3: Unistring; S1 := 'hello' as ascii; S2 := 'foobar' as utf8; S3 := S1 + S2; S3 was UTF8 + the string can hold any kind of charmap and the string manager is aware of that - additional flag required + allways the optimal encoding is used + the dont have to care about encoding (except if he read from sources with different encodings, like textfiles) - maybe some extra encode/decode work required -Ivo Steinmann Florian Klaempfl schrieb: I've continued to work on support of an unicodestring type in fpc. It's currently in an svn branch at: http://svn.freepascal.org/svn/fpc/branches/unicodestring and will be merged later to trunk. The unicodestring type is a ref. counted utf-16 string. On non-windows, widestring is mapped to this type. If you're interested in unicode support please test, give feedback here and submit fixes. An existing working copy of trunk can be switched to this branch by cd fpc svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring and back with svn switch http://svn.freepascal.org/svn/fpc/trunk ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Daniël Mantione wrote: Op Sat, 30 Aug 2008, schreef Marco van de Voort: So then you can (hopefully) pretty much do {$ifdef unix} // in reality it is more complicated than ifdef unix, but for now.. TNativeString = type ansistring (CP_UTF8); {$else} TNativeString = type TUnicodeString; {$endif} And use TNativeString for encoding agnostic purposes. Well, really agnostic code should simply use "string" :) Delphi is introducing the RawByteString type, that skips the auto encoding conversion. I don't know where it fits in the upcoming unicode schema. Anyway there's an example how to use it: http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/ Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Graeme Geldenhuys schrieb: On Sat, Aug 30, 2008 at 4:07 PM, Florian Klaempfl <[EMAIL PROTECTED]> wrote: I don't know what is "core", so would you mind forwarding the related messages to this group? Not really, we had enough useless and time wasting discussions about this. Still doesn't answer my question as to what "core" is... Is that a different mailing list to fpc-devel? Yes, invite only mailing list for active developers. If so, is there a archive I can search? No :) I would just like to know the arguments (background information) for both utf-[8,16] utf-8 will be supported by an extended ansistring which will support different encodings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
On Sat, Aug 30, 2008 at 4:07 PM, Florian Klaempfl <[EMAIL PROTECTED]> wrote: >> >> I don't know what is "core", so would you mind forwarding the related >> messages to this group? > > Not really, we had enough useless and time wasting discussions about this. Still doesn't answer my question as to what "core" is... Is that a different mailing list to fpc-devel? If so, is there a archive I can search? I would just like to know the arguments (background information) for both utf-[8,16] Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
JoshyFun schrieb: Hello Florian, Saturday, August 30, 2008, 1:37:42 PM, you wrote: FK> I've continued to work on support of an unicodestring type in fpc. It's FK> currently in an svn branch at: FK> http://svn.freepascal.org/svn/fpc/branches/unicodestring FK> and will be merged later to trunk. The unicodestring type is a ref. FK> counted utf-16 string. On non-windows, widestring is mapped to this FK> type. If you're interested in unicode support please test, give feedback FK> here and submit fixes. I'm writting some unicode support functions, they are mostly based in the current WideString format. Is there any important technical difference which could prevent the current code to work as the WideString one ? This depends on the code and that's why there is this branch :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Op Sat, 30 Aug 2008, schreef Marco van de Voort: In our previous episode, Michael Van Canneyt said: and back with svn switch http://svn.freepascal.org/svn/fpc/trunk What happened to the idea of dynamical encoding ? And why utf-16 ? Unix uses UTF-8 by default, which means that a conversion must be done each time you interface to the OS ? I assume this means Tiburon UTF-8 extension to ansistring follows on this change. So then you can (hopefully) pretty much do {$ifdef unix} // in reality it is more complicated than ifdef unix, but for now.. TNativeString = type ansistring (CP_UTF8); {$else} TNativeString = type TUnicodeString; {$endif} And use TNativeString for encoding agnostic purposes. Well, really agnostic code should simply use "string" :) Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel