subject:"Re\: \[fpc\-devel\] Unicodestring branch, please test and help fixing"

Re: [fpc-devel] Unicodestring branch, please test and help fixing #2

2008-09-16 Thread ABorka

After reading the mailing lists more I've played a little bit with this 
and it seems that with Zeos (MySQL5) it is working out of the box at 
first sight (need some more tests) if one sets the 
ZConnection.Properties to


character_set_client=utf8
character_set_connection=utf8
character_set_database=utf8
character_set_results=utf8
character_set_server=utf8
character_set_system=utf8
collation_connection=utf8_general_ci
collation_database=utf8_general_ci
collation_server=utf8_general_ci
Codepage=utf8

(suggested by Ivan Gan on the Lazarus mailing list back in July)
No matter if the database/table is UTF8 encoded or not, the returned 
values are OK for the Lazarus controls as well as to the 
FieldByName...AsString string assignments. For me it seems to be working 
  both ways (reading from the tables and writing to the tables with SQL 
statements).


If I'm using the fcl-db (SQLdb Lazarus package) no matter what I tried 
it didn't work out of the box. Unless I used the 
ConvertEncoding(SomeStringFromTheTable, 'cp1252', 'utf8') function I 
could not get any string value from the table fields. Did not matter if 
the database/table was UTF8 encoded or not.

I'm sure it can be made to work somehow.

AB

Joost van der Sluis wrote:

Op vrijdag 12-09-2008 om 15:56 uur [tijdzone +0200], schreef Mattias
Gärtner:

Zitat von Joost van der Sluis <[EMAIL PROTECTED]>:


Op vrijdag 12-09-2008 om 13:22 uur [tijdzone +0200], schreef JoshyFun:


A> Thanks for pointing me to the Lazarus thread about this and the bug
A> report. Checked them.
A> But as I understand there is no solution available at the moment for

this.

I had partially solved the problem using the handler "OnGetText" ?
(I'm not sure about the name) for each field which is somehow dirty
forcing a codepage to UTF8 conversion (in Lazarus you will find some
codepage<->UTF conversions available).

I think that the original poster didn't looked very well in the
archives, this solution is told here quite often.


A> I have a database that is not encoded utf8 (and it will never be because
A> other client programs are accessing it and their users do not want/need
A> to be converted to unicode). How do I get the field values into
A> FPC/Lazarus into a string variable? Right now the non-unicode strings
A> are returned as empty from a database field due to FCL conversion

functions.

If you will need this as a fixed solution for this project maybe you
can think in create a new database unit file based in the current one
(change the name of course) with hardcoded UTF8 encoding from codepage
for each string once retrieved from the database. Take care about
string length as UTF8 ones will be equal or longer than the original
ones.

You can just override one single method to do this. This is also told a
few times on this list.

Maybe it is not documented at the right place?


It is not documented at all. Just like the rest of the database-stuff.
But maybe I should write a FAQ for fpc. With the new lazarus-versions
using UTF-8 by default, this is asked quite often.

Joost

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-14 Thread Paul Ishenin


Florian Klaempfl wrote:
No. First, imo it's more likely that the next release will be 2.4.0 
and not 2.2.4, further, the changes are too big.
Do you have a todo for it? IOW, what is missing to start releasing of 
2.4.0 (we need resources changes ;) )


Best regards,
Paul Ishenin.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-14 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Sunday 14 September 2008 19.22:13 Florian Klaempfl wrote:

Martin Schreiber schrieb:

I tried with trunk, same result. The problem is probably that the second
constant string parameter has a wrong reference count. It is initially 0
instead of -1. The incref call at begin of winfilepath turns it to 1,
decref in finalize section of winfilepath tries to free the constant
string memory -> bumm.

Fixed in rev 11779. Thanks for the test.


Win32 MSEide works now with UnicodeString, no problems found up to now. :-)
Thanks a lot!
Do you plan to merge to fixes_2_2?


No. First, imo it's more likely that the next release will be 2.4.0 and 
not 2.2.4, further, the changes are too big.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-14 Thread Martin Schreiber

On Sunday 14 September 2008 19.22:13 Florian Klaempfl wrote:
> Martin Schreiber schrieb:
> > I tried with trunk, same result. The problem is probably that the second
> > constant string parameter has a wrong reference count. It is initially 0
> > instead of -1. The incref call at begin of winfilepath turns it to 1,
> > decref in finalize section of winfilepath tries to free the constant
> > string memory -> bumm.
>
> Fixed in rev 11779. Thanks for the test.

Win32 MSEide works now with UnicodeString, no problems found up to now. :-)
Thanks a lot!
Do you plan to merge to fixes_2_2?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-14 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Thursday 11 September 2008 23.18:07 Florian Klaempfl wrote:

Martin Schreiber schrieb:

On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:

I have a crash in MSEide startup in a procedure finalization section:

[...]

I saw that you merged unicodestring to trunk. Should I test with trunk
instead of unicodestring branch?

Yes. Unicodestring branch is closed.


I tried with trunk, same result. The problem is probably that the second 
constant string parameter has a wrong reference count. It is initially 0 
instead of -1. The incref call at begin of winfilepath turns it to 1, decref 
in finalize section of winfilepath tries to free the constant string 
memory -> bumm.


Fixed in rev 11779. Thanks for the test.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-13 Thread Martin Schreiber

On Thursday 11 September 2008 23.18:07 Florian Klaempfl wrote:
> Martin Schreiber schrieb:
> > On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:
> >
> > I have a crash in MSEide startup in a procedure finalization section:
[...]
> > I saw that you merged unicodestring to trunk. Should I test with trunk
> > instead of unicodestring branch?
>
> Yes. Unicodestring branch is closed.

I tried with trunk, same result. The problem is probably that the second 
constant string parameter has a wrong reference count. It is initially 0 
instead of -1. The incref call at begin of winfilepath turns it to 1, decref 
in finalize section of winfilepath tries to free the constant string 
memory -> bumm.
Testresult:
"
-1
1
An unhandled exception occurred at $77892373 :
EAccessViolation : Access violation
  $77892373
  $778922F8
  $0040A214
  $004097A3
  $004098DD
  $004099DB
  $00408844
  $004068EA
  $0040696D
  $00401858
  $004018D5
"
The crash stack:
"
#0  77892373 :0 ??()
#1  00416E04 :0 U_SYSTEM_ENTRYINFORMATION()
#2  004196D4 :0 U_SYSINITPAS_ENTRYINFORMATION()
#3  004162C4 :0 U_SYSTEM_OUTPUT()
#4  014CFE20 :0 ??()
#5  00408799 sysheap.inc:38 SYSOSALLOC(SIZE=0)
#6  778922F8 sysheap.inc:0 ??()
#7  00416F00 sysheap.inc:0 U_SYSTEM_ORPHANED_FREELISTS()
#8  0040A762 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void)
#9  0040A214 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, 
LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 944, 
SPINCOUNT = 0})
#10  004097A3 heap.inc:1034 WAITFREE_VAR(PMCV=0x41312c)
#11  004098DD heap.inc:1086 SYSFREEMEM_VAR(LOC_FREELISTS=0x416f84, 
PMCV=0x41312c)
#12  004099DB heap.inc:1125 SYSFREEMEM(P=0x413138)
#13  00408844 heap.inc:275 FREEMEM(P=0x413138)
#14  004068EA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x413138)
#15  0040696D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x413138)
#16  00401858 decrefcrash.pas:63 WINFILEPATH(DIRNAME=0x0, FILENAME=0x413138, 
result=0x14fdab0)
#17  004018D5 decrefcrash.pas:69 main()
"
And there are calls to fpc_WideStr_Decr_Ref I don't understand.
Test program attached.

Martin
program decrefcrash;
{$ifdef FPC}{$mode objfpc}{$h+}{$endif}
{$ifdef mswindows}{$apptype console}{$endif}
uses
 {$ifdef FPC}{$ifdef linux}cthreads,{$endif}{$endif}
 sysutils;

const
 maxdatasize = $7fff; 
type
 msechar = unicodechar;
 msestring = unicodestring;
 msecharaty = array[0..maxdatasize div sizeof(msechar)-1] of msechar;
 pmsecharaty = ^msecharaty;

procedure replacechar1(var dest: msestring; a,b: msechar);
  //replaces a by b
var
 int1: integer;
begin
 uniquestring(dest);
 for int1:= 0 to length(dest)-1 do begin
  if pmsecharaty(dest)^[int1] = a then begin
   pmsecharaty(dest)^[int1]:= b;
  end;
 end;
end;


function winfilepath(dirname,filename: msestring): msestring;
begin
 writeln((pptrint(pointer(dirname))-2)^);
 flush(output);
 writeln((pptrint(pointer(filename))-2)^);
 flush(output);
 replacechar1(dirname,msechar('/'),msechar('\'));
 replacechar1(filename,msechar('/'),msechar('\'));
 if (length(dirname) >= 3) and (dirname[1] = '\') and (dirname[3] = ':') then begin
  dirname[1]:= dirname[2]; // '/c:' -> 'c:\'
  dirname[2]:= ':';
  dirname[3]:= '\';
  if (dirname[4] = '\') and (length(dirname) > 4) then begin
   move(dirname[5],dirname[4],(length(dirname) - 4)*sizeof(msechar));
   setlength(dirname,length(dirname) - 1);
  end;
 end;
 if filename <> '' then begin
  if dirname = '' then begin
   result:= '.\'+filename;
  end
  else begin
   if dirname[length(dirname)] <> '\' then begin
result:= dirname + '\' + filename;
   end
   else begin
result:= dirname + filename;
   end;
  end;
 end
 else begin
  result:= dirname;
 end;
end;

var
 mstr1,mstr2: msestring;
begin
 mstr2:= 'C:\Dokumente und Einstellungen\mseca\Anwendungsdaten\.mseide';
 mstr1:= winfilepath(mstr2,'*');
end.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread ABorka

> It is not documented at all. Just like the rest of the database-stuff.
> But maybe I should write a FAQ for fpc. With the new lazarus-versions
> using UTF-8 by default, this is asked quite often.

This would be really nice.

I know I'm not the only one who doesn't want to spend days on hacking 
and debugging the components and FCL code to find out why the database 
field values disappear/morf before reaching my program code when they 
didn't do it before. People will start using these new unicode based 
development tools and this problem will be there for all of them (and 
the problem is not only with the DBAware components but using a simple 
FieldByNameAsString and putting it into a normal control too).

A transparent solution would be the best - like FCL to do conversions 
back and forth automatically from the database codepage when asked to - 
but I guess that is too much to ask for. :) Maybe not even possible.

Thank you for the help guys. Ill try to dig up more info from the 
mailing list archives when I have time.

Joost van der Sluis wrote:

Op vrijdag 12-09-2008 om 15:56 uur [tijdzone +0200], schreef Mattias
Gärtner:

Zitat von Joost van der Sluis <[EMAIL PROTECTED]>:

Op vrijdag 12-09-2008 om 13:22 uur [tijdzone +0200], schreef JoshyFun:

A> Thanks for pointing me to the Lazarus thread about this and the bug
A> report. Checked them.
A> But as I understand there is no solution available at the moment for

this.

I had partially solved the problem using the handler "OnGetText" ?
(I'm not sure about the name) for each field which is somehow dirty
forcing a codepage to UTF8 conversion (in Lazarus you will find some
codepage<->UTF conversions available).

I think that the original poster didn't looked very well in the
archives, this solution is told here quite often.

A> I have a database that is not encoded utf8 (and it will never be because
A> other client programs are accessing it and their users do not want/need
A> to be converted to unicode). How do I get the field values into
A> FPC/Lazarus into a string variable? Right now the non-unicode strings
A> are returned as empty from a database field due to FCL conversion

functions.

If you will need this as a fixed solution for this project maybe you
can think in create a new database unit file based in the current one
(change the name of course) with hardcoded UTF8 encoding from codepage
for each string once retrieved from the database. Take care about
string length as UTF8 ones will be equal or longer than the original
ones.

You can just override one single method to do this. This is also told a
few times on this list.

Maybe it is not documented at the right place?

It is not documented at all. Just like the rest of the database-stuff.
But maybe I should write a FAQ for fpc. With the new lazarus-versions
using UTF-8 by default, this is asked quite often.

Joost

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread listmember


Sorry, but I meant comparing with collation. I did not mean comapring
within labguage context.


How can you do /proper/ collation while ignoring the language context?


1) 'sıkıcı' which means 'boring' in English (notice the dotless small
'i's)

2) 'sikici' which means 'fucker' in English



Depends how you normalize. Normalize should sbstitute all *equal*
letters (or combination thereof) into one single form. That allows
comparing and matching them.


Again, we're not quite on the same page here...

What you're referring is more like 'Text Normalization' [ 
http://en.wikipedia.org/wiki/Text_normalization ] where you do 
definitely need a very comprehensive dictionary so that '1' is equal to 
'one' and '1st' is 'first', etc. (if your language is English).


Whereas, what I am referring to is 'Unicode Normalization' [ 
http://en.wikipedia.org/wiki/Unicode_normalization ].


This one is much narrower in scope. It deals basically with what I can 
refer to as 'character glyphs'.


Now, from what I understand from the definitions of 'Unicode 
Normalization' there are 2 ways of doing it:


1) You decompose both texts (so that you have all 'weird' characters 
ezpanded into their combining characters)


2) You compose both texts (so that, you have as few or no combining 
characters)


This is done, obviously, to get them both in the same format --to make 
life easier to compare.


If you do no other operation on these two texts before you compare them, 
this is called Canonical Equivalnece Test --each 'character glyph' in 
each text must be the same.


For Canonical Equivalnece Test, you do not need to have any 'language' 
attribute --afer all, you're doing a simple byte-wise test.


On the other hand, if you wish to do a broader comparison, 
Compatibility Equivalnece Test or something other, you will need to do a 
little more work on those texts:


Normalization is one of them. I suggest you take a look at the 
'Normalization' heading under 
http://en.wikipedia.org/wiki/Unicode_normalization


Trouble with the 'Normalization' described there is, it is far too crude 
for quite a lot of purposes.


A better form of comparison is, converting both text to either uppercase 
or to lowercase.


And, once we do this, we hit two walls (or obstacles) to overcome. The 
steps I can think of are:


1) Equivalent code points. We need first to 'compose' the text and then 
substitute the relevant (and preferred) equivalent code points for any 
'character glyph's in the texts.


2) We also need to take care of stuff like language dependent case 
transforms. See http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I


As far as I know, this is the only 'proper' thing to do for search and 
comparison operations under unicode.


I know it will be slower, but, that is the price to pay.

Note: The reason I used the term 'character glyphs' is because, several 
codepoint can be combined to make a 'character glyph'.


See the definition of Code Point [ http://unicode.org/glossary/ ] which 
says:


"Code Point: Any value in the Unicode codespace; that is, the range of 
integers from 0 to 1016."


As an example, from the above Wiki article, we can use 2 code points to 
produce a 'character glyph', such as


'n' + '~' --> ñ


But yes, even this is very limited (busstop), because even if you know
the language of the wort (german in my example) you do not know its
meaning.


You do not worry about the meaning at all. In all languages (I guess) 
there are several words that may be written the same but mean different 
things.



Without a full dictionary, you do not know if ss and german-sharp-s are
the same or not.


True. But, if you do know it is in German, then you definitely know they 
are. And, this makes a lot of difference.



So basically what you want to do, can only be done with a full
dictionary. Or you have to accept false positives.


Nope. No false positives in text level.

You can always, of course, get false positives in semantic level --such 
as when you're looking for 'apple' (the fruit) and 'Apple' (the brand 
name), but that's a completely different problem.



I also fail to see why a utf8 string is a half baked solution. It will
serve most people fine. It can be extended for those who want more.


I have nothing against UFT-8 or any other encoding schemes. It is just 
that --en encoding scheme. Most handy as a means of transport data from 
one medium/app to another.


But, UFT-8 does in no way cover the whole of Unicode or is a complete 
solution for dealing with unicode. It is, after all, an encoding scheme.



BUT of course there is no way do deal with the ambitious "Busstop"


Not even if you knew that "Busstop" was a german string?


In deed. For this case, you need to know what language "Busstop" was
written in.

you need a dictionary. knowing it is German is not enough. because all
that "it is german" tells you is, that "ss" maybe a sharp-s, but doesn't
have to be


A dictionary, then, wouldn't help you eit

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Martin Friebe


listmember wrote:

IMHO The discussion splits here between:
1) How can this be done in a specific app
2) what should fpc provide

as for 2: This would be on top of yet (afaik) missing basic functions
such as
Compare using collation x (where collation is given as argument to
compare, not as part of any string)
I think we're beginning to be on the same page --but, please, can you 
refrain from using the word 'collation'; every time I see that in this 
context, I feel a strong need to open the window and shout "collation 
isn't the most important/used part of a language wrt programming" :)
Sorry, but I meant comparing with collation. I did not mean comapring 
within labguage context.


language context is to complex to be basic (see busstop below)

2) actual compare, you need to "normalize" all strings before comparing,
then compare the normalized string as bytes.

normalizing means for each char to decide how to represent it. German
"ae" could be represented as a umlaut for the compare.
Or (in German text) you expand all umlaute first.


IOW, SameText() and similar stuff must take normalization into account.

But, you do know that 'normalization' is a very rough assumption and 
land you in some very embarassing situations.


Here is 2 words from Turkish.

1) 'sıkıcı' which means 'boring' in English (notice the dotless small 
'i's)


2) 'sikici' which means 'fucker' in English
Depends how you normalize. Normalize should sbstitute all *equal* 
letters (or combination thereof) into one single form. That allows 
comparing and matching them.
But yes, even this is very limited (busstop), because even if you know 
the language of the wort (german in my example) you do not know its meaning.


Without a full dictionary, you do not know if ss and german-sharp-s are 
the same or not.
So basically what you want to do, can only be done with a full 
dictionary. Or you have to accept false positives.


I also fail to see why a utf8 string is a half baked solution. It will 
serve most people fine. It can be extended for those who want more.


IMHO this is a case for an add-on library.
And apparently no one has yet volunteered to write it



Now, when you normalize these you get 'SIKICI' for both which --then-- 
you would assume to be the same.



BUT of course there is no way do deal with the ambitious "Busstop"


In deed. For this case, you need to know what language "Busstop" was 
written in.
you need a dictionary. knowing it is German is not enough. because all 
that "it is german" tells you is, that "ss" maybe a sharp-s, but doesn't 
have to be

What I can not do (or what I do not want to do) is to decide which of
them other people do want to use.

But, isn't this just that: IOW, you're deciding what other people will
NOT want to use if you throw the 'language' attribute (for each char)
out of the window..

True, I am happy to do that. NOT

I am glad we have met :)

have we? I remember a mail conversation, but not an actual meeting :) SCNR

Why you can always extend this. Store you string in any of the following
ways
1) every 2nd char is a language attribute, not a char
2) store the language attributes in a 2nd string, always pass both
strings around


Of course, these and even more creative hacks could be devised.
The question is, is the language an attribute of a unicode character?

(I assume "mandatory attribute")

Well as much as it is or is not an attribute of a latin1 or iso-whatever 
char.


I do not think it is. I have no proof. But a lot of people seem to think 
so, if I goggle Unicode (or any other char/latin./iso...) I get nice 
character tables; and no language info.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread listmember


[Note that, here 'TCharacter' isn't necessarily an object; it might as
well be a simple record structure.]


AFAIK for most programmers this is not a common task. Most programs need less
(one language or codepage)


But, when you're talking unicode, codepage is rather meaningless --isn't it?


or more (phonetic, semantic, statistical search).
Can you explain, why you think that this particular problem requires compiler
magic?


See my other reply to Martin Friebe, in another sub thread.


Is there, in Unicode, start-stop markes that denote 'language'?


Is it needed?
Are the any unicode characters, that upper/lower depend on language?


Yes. See my other reply to Martin Friebe, in another sub thread.


Take this, for example:

"if SameText(SomeString, SomeOtherString) then do ..."

For this to work properly, in both 'SomeString' and 'SomeOtherString',
you need to know which language *each* character belongs to.


Comparing texts can be done with various meanings. For example: byte comparison,
simple case insensitive comparison, not literal comparison, compare like this
library, 
Which one do you mean?


Byte comparison isn't what I am worried about.

In every language, there a pretty known and fixed (by now) rules that 
apply to string comparison. I am referring to those rules.



[...]
Here is a simple example for you:

"if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..."

Now.. how are you going to decide that SameText() function here returns
true unless you have information that the substring 'FoolStraße' is in
German?


The two strings have the same language, but are written with different
Rechtschreibung. You need dictionaries and spelling systems to implement such
comparisons. This is beyond a compiler or a RTL.


Are you sure. I was under the impression that Unicode covers these 
--without needing further data.



What about loan words?


For all practical purposes, 'loan words' belong to the language they are 
used in.


Except the case where we'd be discussing etymology.


SameText('Istanbul', 'istanbul') can only return true when both
'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani.

Otherwise, the same SameText() has to return false.


I doubt that it is that easy.


Well.. I never said that it would be that easy.

But, if strip off the language attribute from the caharcater, it will be 
impossible --or several orders of magnitude harder for those people who 
need it.


You can, of course, ignore all that.

But, then, what is the point of going unicode?

We were just fine doing things ANSI-centric..

Weren't we?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread listmember


Actually for you example case doesn't matter. as you need to decide if
"ss" = "ß"


And, this is only valid in German. For all other, the result must either 
be false, or undefined.



Is there, in Unicode, start-stop markes that denote 'language'?



I do not know, that was why I said "unused unicode" and "implemented on
top" (as part of the specific app)


As far as I know, there isn't a language delimiter in Unicode.


IMHO The discussion splits here between:
1) How can this be done in a specific app
2) what should fpc provide

as for 2: This would be on top of yet (afaik) missing basic functions
such as
Compare using collation x (where collation is given as argument to
compare, not as part of any string)


I think we're beginning to be on the same page --but, please, can you 
refrain from using the word 'collation'; every time I see that in this 
context, I feel a strong need to open the window and shout "collation 
isn't the most important/used part of a language wrt programming" :)



Take this, for example:

"if SameText(SomeString, SomeOtherString) then do ..."
For this to work properly, in both 'SomeString' and 'SomeOtherString',
you need to know which language *each* character belongs to.



I would rather say:
"There are special cases where you need/want to know which language"


Yes. And, if we're on our way to make FPC unicode-enabled, we need to 
take these special cases into account. Otherwise, we will likely end up 
with a half baked 'solution'.



So I do not imply how special or none special those cases are => you do
not always need to know. (continued below on your example)


Why would I need to ALWAYS need it. Isn't 'needed when necessary' good 
enough?



2) actual compare, you need to "normalize" all strings before comparing,
then compare the normalized string as bytes.

normalizing means for each char to decide how to represent it. German
"ae" could be represented as a umlaut for the compare.
Or (in German text) you expand all umlaute first.


IOW, SameText() and similar stuff must take normalization into account.

But, you do know that 'normalization' is a very rough assumption and 
land you in some very embarassing situations.


Here is 2 words from Turkish.

1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's)

2) 'sikici' which means 'fucker' in English

Now, when you normalize these you get 'SIKICI' for both which --then-- 
you would assume to be the same.


Well.. I'd like to see you (or your boss) when you've come up will all 
those 'fucker's instead of all those 'boring' old farts you were lookin 
for :P


[You might probably think of a German --or some othe language-- example]

IOW, what I am trying to tell you is that normalization isn't really 
useful --it is, IMO, a stopgap solution along the path of Unicode evolution.



BUT of course there is no way do deal with the ambitious "Busstop"


In deed. For this case, you need to know what language "Busstop" was 
written in.



What I can not do (or what I do not want to do) is to decide which of
them other people do want to use.

But, isn't this just that: IOW, you're deciding what other people will
NOT want to use if you throw the 'language' attribute (for each char)
out of the window..



True, I am happy to do that. NOT


I am glad we have met :)


Why you can always extend this. Store you string in any of the following
ways
1) every 2nd char is a language attribute, not a char
2) store the language attributes in a 2nd string, always pass both
strings around


Of course, these and even more creative hacks could be devised.

The question is, is the language an attribute of a unicode character?


SameText('Istanbul', 'istanbul') can only return true when both
'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani.



ok thats what I did not know. But still in most cases it will be fine to do
SameText('Istanbul', 'istanbul', lGerman)
SameText('Istanbul', 'istanbul', lTurkish)
decide at the time of comparing


Well, the prototype I had in mind was:

SameText('Istanbul', 'istanbul', lGerman, lTurkish)

weher the defaults for the latter 2 parameters would be lUnknown --this 
way, people who needen't be bothered about these would not even notice.



If however the info was stored on the string (or char) what if one was
Turkish, the other German ?


SameText('Istanbul', 'istanbul', lTurkish, lGerman)

This one must return FALSE since, in Turkish, uppercased dotted small 
'i' is DOTTED capital 'i' (i.e. 'İ').


and,

SameText('Istanbul', 'istanbul', lTurkish, lGerman)

will return TRUE since uppercasing both sides result in the same string.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Martin Friebe


listmember wrote:

Martin Friebe wrote:

Just to make sure, all of this discussion is based on various collation

No part of this discussion is based on collation.

Ok, so we were talking about different things


Here is a scenario for you:

You have multilanguage text as data. Someone has asked you to search 
it and see if a certain peice of string (in a given language) exists 
in it.

This search needs to be NOT case-sensitive.
Actually for you example case doesn't matter. as you need to decide if 
"ss" = "ß"

How can you do this?
Is it doable if TCharacter (or wahtever you call it) has no 'langauge' 
attribite?


For the purpose of case-sensitivity. I still do not know of a character 
or rather a pair of upper and lower case char)  that maps different in 
some languages?
Is there a pair of character "x" and "X"  which should in some languages 
be matching upper/lower, but in other languages should not?

^^ ignore, found your example at the end of mail

Otherwise how do I understand the case-insensitive part of your 
question? Because if "x" is the lowercase of "X" in *all* languages, 
then I do not need the language specific info to do the 
none-case-sensitive compare.


Sorry if I am still missing some point...

[Note that, here 'TCharacter' isn't necessarily an object; it might as 
well be a simple record structure.]

Yes we agreed on this part


Besides instead of storing it per char, you can use unused unicode as
start/stop markers. So it can be implemented on top of a string that
stores unicode-chars (and chars only, no attributes)

Is there, in Unicode, start-stop markes that denote 'language'?
I do not know, that was why I said "unused unicode" and "implemented on 
top" (as part of the specific app)


IMHO The discussion splits here between:
1) How can this be done in a specific app
2) what should fpc provide

as for 2: This would be on top of yet (afaik) missing basic functions 
such as
Compare using collation x (where collation is given as argument to 
compare, not as part of any string)

Why is language intrinsic to the text? An "A" is an "A" in any language.
At best language is intrinsic to sorting/comparing(case on non
case-sense) text


Comparing is a lot more important an operation than collating --or, 
rather, collation is achieveable only if you can do proper comparisons.


Take this, for example:

"if SameText(SomeString, SomeOtherString) then do ..."
For this to work properly, in both 'SomeString' and 'SomeOtherString', 
you need to know which language *each* character belongs to.

I would rather say:
"There are special cases where you need/want to know which language"

So I do not imply how special or none special those cases are => you do 
not always need to know. (continued below on your example)




If you dont have that informtaion, you might as well not have a 
SameText() function in FPC.



Please note the 'case-INsensitive' keyword there.

Well I needed an actual example where case sense differs by language
(assuming we talk about language using the same charset (not comparing
Chinese whit English).


Here is a simple example for you:

"if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..."

Well that is a good question, do you always want that to return the same?
"Busstop" and "Bußtop" (Yeah the second is not a word, but could occur 
in a text)


Also in Names this comparisons does not always apply.

the Name "Heiße" (originally with ß) can be spelled as "Heisse"
But the Name "Heisse" (originally with "ss") is never the same has "Heiße"


But as for asking me: This a specialized comparison, Similar to soundex 
(compare sound of 2 words, usually based on english)
Something like this is usually found in extension libraries, but not in 
the standard functionally of a (many/most) languages.


In any case I think this also has the minority problem. Most people do 
not want to compare pascal strings this way (and if it only is because 
of false positives)




That does not mean that I say such functionality is not desirable. It 
would be great having a unit that can be used if needed.


Based on the idea that this are optional (or 3rd party) functions, the 
normal String would not provide for this. (Besides attaching info to 
each char would probably be to costly, even if implemented in the fpc 
core string.)
Functions like this could take an additional structure declaring the 
start/stop/change point of every language.




In any case, I can write up several different algorithms how to do that.

Please do. SameText(), for one, will need all the help it can get.
The initial comment was based on collation, and basically would have 
been about prioritizing in conflicts.


There are 2 parts:
1) identifying the language.

I would recommend a separate structure, with all language start points. 
It takes some work to maintain, but should work


alternatively use dynarray instead of string. Define a record holding 
all info per char that you need. overload all operators

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Florian Klaempfl

Daniël Mantione schrieb:
> 
> 
> Op Fri, 12 Sep 2008, schreef listmember:
> 
>> This search needs to be NOT case-sensitive.
>>
>> How can you do this?
>>
>> Is it doable if TCharacter (or wahtever you call it) has no 'langauge'
>> attribite?
> 
> 'I am on FoolStrasse' versus 'I am on FoolStraße' is not a upper/lower
> case issue. Strasse and Straße have the same casing. So yes, you can do
> case-insensitive search.
> 
> The problem you describe does exists. ü and ue are equivalent in German,

Not in both directions.

> but not in Dutch. So someone searching for ü will also want to receive
> results for ue,
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Daniël Mantione




Op Fri, 12 Sep 2008, schreef listmember:


This search needs to be NOT case-sensitive.

How can you do this?

Is it doable if TCharacter (or wahtever you call it) has no 'langauge' 
attribite?


'I am on FoolStrasse' versus 'I am on FoolStraße' is not a upper/lower 
case issue. Strasse and Straße have the same casing. So yes, you can do 
case-insensitive search.


The problem you describe does exists. ü and ue are equivalent in German, 
but not in Dutch. So someone searching for ü will also want to receive 
results for ue, a Dutch speaking person would not.


This however, should not be fixed at the string level, but at the file 
format level. I.e. in HTML you can do . You could design a 
#27 escape code for text files if you'd like.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Mattias Gärtner

Zitat von listmember <[EMAIL PROTECTED]>:

>[...]
> You have multilanguage text as data. Someone has asked you to search it
> and see if a certain peice of string (in a given language) exists in it.
>
> This search needs to be NOT case-sensitive.
>
> How can you do this?
>
> Is it doable if TCharacter (or wahtever you call it) has no 'langauge'
> attribite?
>
> [Note that, here 'TCharacter' isn't necessarily an object; it might as
> well be a simple record structure.]

AFAIK for most programmers this is not a common task. Most programs need less
(one language or codepage) or more (phonetic, semantic, statistical search).
Can you explain, why you think that this particular problem requires compiler
magic?

> []
> Is there, in Unicode, start-stop markes that denote 'language'?

Is it needed?
Are the any unicode characters, that upper/lower depend on language?


>[...]
> Comparing is a lot more important an operation than collating --or,
> rather, collation is achieveable only if you can do proper comparisons.
>
> Take this, for example:
>
> "if SameText(SomeString, SomeOtherString) then do ..."
>
> For this to work properly, in both 'SomeString' and 'SomeOtherString',
> you need to know which language *each* character belongs to.

Comparing texts can be done with various meanings. For example: byte comparison,
simple case insensitive comparison, not literal comparison, compare like this
library, 
Which one do you mean?


>[...]
> Here is a simple example for you:
>
> "if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..."
>
> Now.. how are you going to decide that SameText() function here returns
> true unless you have information that the substring 'FoolStraße' is in
> German?

The two strings have the same language, but are written with different
Rechtschreibung. You need dictionaries and spelling systems to implement such
comparisons. This is beyond a compiler or a RTL.


> I know that this is a very simple example --that 'ß' exists only in
> German, and that you could infer that when you met that char.
>
> But, this hightlights the problem --and there are times when you cannot
> infer.
>
> > In any case, I can write up several different algorithms how to do that.
>
> Please do. SameText(), for one, will need all the help it can get.
>
> > What I can not do (or what I do not want to do) is to decide which of
> > them other people do want to use.
>
> But, isn't this just that: IOW, you're deciding what other people will
> NOT want to use if you throw the 'language' attribute (for each char)
> out of the window..

What about loan words?


> > Or, if this is not what you think of, please clarify by example..
>
> Here is another typical example:
>
> SameText('Istanbul', 'istanbul') can only return true when both
> 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani.
>
> Otherwise, the same SameText() has to return false.

I doubt that it is that easy.

Mattias

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread listmember


Martin Friebe wrote:

Just to make sure, all of this discussion is based on various collation


No part of this discussion is based on collation.


I am going to leave out the object question for now. I said all I can
say in earlier mails.


That's good. Thank you.


And also from your comments it appears more a question of collation

> being stored with the string, substring, or even each char.

Martin, are you doing this on purpose? I mean, are you intentionaly 
driving me up the wall?


Seriously. Can't you forget/drop this 'collation' word?!

And, then, think a little deeper.

Here is a scenario for you:

You have multilanguage text as data. Someone has asked you to search it 
and see if a certain peice of string (in a given language) exists in it.


This search needs to be NOT case-sensitive.

How can you do this?

Is it doable if TCharacter (or wahtever you call it) has no 'langauge' 
attribite?


[Note that, here 'TCharacter' isn't necessarily an object; it might as 
well be a simple record structure.]



As found in the last mail, there is currently no standard for handling
cross-collation in any string function (that is string function, which
could be collation based).
1) IMHO only few people would need this. For the majority it would be
unwanted overhead.
2) Within those few, there would be too many different Expectation as to
what the "standard" should be. If FPC choose one such standard at will,
it would benefit almost no one.


You're still stuck with that wretched word 'collation'.


The best FPC could to is provide storage, for something that is not
handled or obeyed in any function handling the data. This doesn't sound
desirable to me. If anyone who needs it will have to implement the
functions, then those may add there own storage for it too.

Besides instead of storing it per char, you can use unused unicode as
start/stop markers. So it can be implemented on top of a string that
stores unicode-chars (and chars only, no attributes)


Is there, in Unicode, start-stop markes that denote 'language'?


All the others are not an intrinsic part of o a char at all --they
vary by context.



Why is language intrinsic to the text? An "A" is an "A" in any language.
At best language is intrinsic to sorting/comparing(case on non
case-sense) text


Comparing is a lot more important an operation than collating --or, 
rather, collation is achieveable only if you can do proper comparisons.


Take this, for example:

"if SameText(SomeString, SomeOtherString) then do ..."

For this to work properly, in both 'SomeString' and 'SomeOtherString', 
you need to know which language *each* character belongs to.


If you dont have that informtaion, you might as well not have a 
SameText() function in FPC.



Please note the 'case-INsensitive' keyword there.

Well I needed an actual example where case sense differs by language
(assuming we talk about language using the same charset (not comparing
Chinese whit English).


Here is a simple example for you:

"if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..."

Now.. how are you going to decide that SameText() function here returns 
true unless you have information that the substring 'FoolStraße' is in 
German?


I know that this is a very simple example --that 'ß' exists only in 
German, and that you could infer that when you met that char.


But, this hightlights the problem --and there are times when you cannot 
infer.



In any case, I can write up several different algorithms how to do that.


Please do. SameText(), for one, will need all the help it can get.


What I can not do (or what I do not want to do) is to decide which of
them other people do want to use.


But, isn't this just that: IOW, you're deciding what other people will 
NOT want to use if you throw the 'language' attribute (for each char) 
out of the window..



Or, if this is not what you think of, please clarify by example..


Here is another typical example:

SameText('Istanbul', 'istanbul') can only return true when both 
'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani.


Otherwise, the same SameText() has to return false.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-12 Thread Martin Friebe

Just to make sure, all of this discussion is based on various collation 
for European languages? Or shall we include Arabic, Chinese and other 
languages? But they have there own chars, they can be identified without 
collation, so they do not need the language info, to be distinguished 
from European text. (They may have collations, the same as a German text 
could be handled in different collations)


listmember wrote:

So maybe the design is quite well thought?


Adding a flag field is easy enough --if all you're doing is to do some 
sort of collation. In that sense, everything is well tought out.


But..

Life becomes very complicated when you begin to do things like FTS 
(full text search) on a multilanguage text in a DB engine.


Your options, in this case, is just very limited:
  -- Ignore the langage issue.
or
  -- store each language in a different field (that is if you know how 
many there will be).


Do you think this is a good solution --or, a hack.

True, that would be hard to do (in DB or pascal, or most other places). 
But again this is a very special case. And that is why none of the 
frameworks (DB, pascal, ...) include it. You have to do your own solution.


At no time did I say (nor did afaik anyone else say) that you can not do 
your own object based text holding objects.

The question were:
1) should FPC replace the string, by an object (like java)
2) which additional attributes should be stored by a string (per string 
/ per char)


And actually both of those question can be moved out of the context of 
Unicode implementation. Because, both of them could also bee applied to 
current (char=byte) based strings.


I am going to leave out the object question for now. I said all I can 
say in earlier mails. And also from your comments it appears more a 
question of collation being stored with the string, substring, or even 
each char.


As found in the last mail, there is currently no standard for handling 
cross-collation in any string function (that is string function, which 
could be collation based).
1) IMHO only few people would need this. For the majority it would be 
unwanted overhead.
2) Within those few, there would be too many different Expectation as to 
what the "standard" should be. If FPC choose one such standard at will, 
it would benefit almost no one.


The best FPC could to is provide storage, for something that is not 
handled or obeyed in any function handling the data. This doesn't sound 
desirable to me. If anyone who needs it will have to implement the 
functions, then those may add there own storage for it too.


Besides instead of storing it per char, you can use unused unicode as 
start/stop markers. So it can be implemented on top of a string that 
stores unicode-chars (and chars only, no attributes)



As for Storing info per string or per char. (Info could be anything:
collation, color, style, font, source-of-quote, author, creation-date,
file, ) everyone would like there own. So again FPC shouldn't do it.
Or everyone gets all the overhead of what all the others wanted.

Collation is a function of language.
Right but language is something you can apply to strings. You are not 
forced to do so. Strings work very well without language too.
Same as you saying "no gui". Strings work without display. Font/Style is 
a function of rendering. I may want to search a string but only want to 
look at chars marked as bold.


Languages is an extension to string, in the same way than rendering 
info, or source info is. To you language may matter a great deal. To 
others other attirbutes will matter.
All the others are not an intrinsic part of o a char at all --they 
vary by context.
Why is language intrinsic to the text? An "A" is an "A" in any language. 
At best language is intrinsic to sorting/comparing(case on non 
case-sense) text

If pascal doesn't suit the need of a specific task, choose a different
tool. Instead of inventing a new pascal.


Thank you for the advice.
But, instead of jailing this discussion to at best a laterally 
relevant issue of collation, can I ask you to think for a moment:
How on earth can you do a case-INsensitive search in *any* given 
string contains multiple language substrings?


Please note the 'case-INsensitive' keyword there.
Well I needed an actual example where case sense differs by language 
(assuming we talk about language using the same charset (not comparing 
Chinese whit English).


In any case, I can write up several different algorithms  how to do 
that. What I can not do (or what I do not want to do) is to decide which 
of them other people do want to use.


search none-case-sensitive 'UP LOW' in ' ups upper lows lower'

with the following attributes:
'UP LOW' is a string of 2 languages.
The word UP is in a language that defines "U" and "u" as different 
letters (not only differ by case, but differ the same as "a" and "b" do 
differ)
The word LOW is in a languages where all letters are having low-case 
equivalents (as in Engl

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread ABorka


Hi,

Thanks for pointing me to the Lazarus thread about this and the bug 
report. Checked them.


But as I understand there is no solution available at the moment for this.

I have a database that is not encoded utf8 (and it will never be because 
other client programs are accessing it and their users do not want/need 
to be converted to unicode). How do I get the field values into 
FPC/Lazarus into a string variable? Right now the non-unicode strings 
are returned as empty from a database field due to FCL conversion functions.


Not to mention writing something to the database back.
Is there a function to convert 'My Perfect™ World®' to whatever format 
the components require and vice versa? Something for the ASCII table up 
till #255 (English letters with some special characters like the above 
example).



JoshyFun wrote:

Hello ABorka,

Thursday, September 11, 2008, 7:26:50 PM, you wrote:

A> The database field can contain any string with '®' in it for this to happen
A> for example: 'sometext®'
A> It seems that
A> ListBox1.Items.Add(SQL1.FieldByName('MyTableField').AsString);
[...]
A> will only put an empty string into the Listbox.
A> Somewhere inside FCL, where the Listbox item is inserted there is a
A> UTF8Decode which ends up with the empty string because of the '®'  #174
A> character it thinks that it is a unicode encoded character and tries to
A> get the additional bytes for it which ain't there.

http://bugs.freepascal.org/view.php?id=11791

A> Not sure how can this be circumvented (using some conversion function?)
A> or if it is a bug.

Check Lazarus list, there is a quite recent thread about that "Unicode
and DBAware" (is the text of the subject).



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread listmember


So maybe the design is quite well thought?


Adding a flag field is easy enough --if all you're doing is to do some 
sort of collation. In that sense, everything is well tought out.


But..

Life becomes very complicated when you begin to do things like FTS (full 
text search) on a multilanguage text in a DB engine.


Your options, in this case, is just very limited:

  -- Ignore the langage issue.
or
  -- store each language in a different field (that is if you know how 
many there will be).


Do you think this is a good solution --or, a hack.


As for Storing info per string or per char. (Info could be anything:
collation, color, style, font, source-of-quote, author, creation-date,
file, ) everyone would like there own. So again FPC shouldn't do it.
Or everyone gets all the overhead of what all the others wanted.


Collation is a function of language.

All the others are not an intrinsic part of o a char at all --they vary 
by context.



Also FPC is a programming language. Not a word processing tool


Well, they should have remembered that before they added in char and 
string types when everything could perfectly be represented with a byte.



Then instead of asking for strings as object, I would ask for an
additional ref-counted object type (with auto destruction). The string
library could be based on this. I am not asking for suxch a think
because a) it wouldn't be pascal anymore. b) beware of the mem-leaks


Personally, I gave up on strings as objects on the compiler level. That 
could, of course be added as a lib.



If pascal doesn't suit the need of a specific task, choose a different
tool. Instead of inventing a new pascal.


Thank you for the advice.

But, instead of jailing this discussion to at best a laterally relevant 
issue of collation, can I ask you to think for a moment:


How on earth can you do a case-INsensitive search in *any* given string 
contains multiple language substrings?


Please note the 'case-INsensitive' keyword there.


Btw in normal math you can not devide a number by zero... Of course you
can define your own math


And, the point is??..
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Martin Friebe


listmember wrote:



I also do not know of other apps that could do this. (And it may not be
possible). Look around. Databses for example, AFAIK the most you can do
is define a collation per column.
True. But, that does not mean that those app/databases are well 
thought out. Does it?
Point of View. Those DB get sold, so either people take what they can 
get and silently accept it (I haven't seen discussions like this on 
related DB discussion groups [ or maybe I read the wrong groups :) ])

or the majority of people doesn't need it.

BTW people want there DB to sort text in a way, that help finding 
entries in the result. So the ordering process should not rely on 
knowledge if a word is English or French. If It did rely on the 
language, then the ordering would not help the search, because you have 
to know the language of all other words to find the one word you are 
looking for.


So maybe the design is quite well thought?



And how would you sort the following example, with mixed collation. Take
the various german collations. ae can be used as a substitution for
a-umlaut.
This is actulaly an arbitary decision --there is no agreed standard on 
this, that I am aware-- so, each developer can have their own way.
Well yes of course you can define how to. But then everyone has a 
different need, and a different definition. That would mean FPC had to 
implement dozens of algorithms.
So it seems better to leave it to each person, as it seems it will be an 
individual thing anyway.


As for Storing info per string or per char. (Info could be anything: 
collation, color, style, font, source-of-quote,  author, creation-date, 
file, ) everyone would like there own. So again FPC shouldn't do it. 
Or everyone gets all the overhead of what all the others wanted.


Also FPC is a programming language. Not a word processing tool
And FPC is pascal. Pascal (afaik) has reference counted strings. And 
objects are not reference counted. Not to mention objects (as string 
type) would only benefit if everyone was allowed to create their own 
child-classes.
Then instead of asking for strings as object, I would ask for an 
additional ref-counted object type (with auto destruction). The string 
library could be based on this. I am not asking for suxch a think 
because a) it wouldn't be pascal anymore. b) beware of the mem-leaks


If pascal doesn't suit the need of a specific task, choose a different 
tool. Instead of inventing a new pascal.

I don't to shell scripts in pascal. And simple web scripts are php or perl.




How would you sort data where one source is of one collation, the other
source of another (or even worse the collation changes halfway through)?
It is impossible by definition.


No. It is not impossible.
But, yes, there is no definition (standard).

It would be upto the developer or the entity that the developer is 
working in.
Btw in normal math you can not devide a number by zero... Of course you 
can define your own math



I even thing that collation is not part of the string. it does not
change the meaning of the string. It is only used in specific
operations. And then it must be one collation for both strings. So if
each of the string had a collation that would cause an issue.


But, my question is --imho-- a lot more relevant to the thread at hand:

How would you do case-insensitive search in a multilangual text.
same as above applies. If every char (or substring) has a collation of 
its own, then you need to define how to compare cross-collation.


because
find('E'[collation1],  'merci'[collation2] + 'mein herr'[collation3])

needs to compare an E (that wants collation1 for the compare) with each 
of the 'e' (that want other collations)
maybe collation1 says that E should equal in upper and lower, while the 
other collations do not? ore vice versa.


there is no standard.



[this has nothing to do with rendering or GUI.]


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread listmember


IMHO You can't? But you could use a TStringList.


I don't think I could.

Because, in TStringList, you have no way of knowing what language each 
item belogs to.


You could, of course, work around it by adding a fake object to each 
item denoting the language, but that does mean a generalized solution.



I also do not know of other apps that could do this. (And it may not be
possible). Look around. Databses for example, AFAIK the most you can do
is define a collation per column.


True. But, that does not mean that those app/databases are well thought 
out. Does it?



And how would you sort the following example, with mixed collation. Take
the various german collations. ae can be used as a substitution for
a-umlaut.


This is actulaly an arbitary decision --there is no agreed standard on 
this, that I am aware-- so, each developer can have their own way.



How would you sort data where one source is of one collation, the other
source of another (or even worse the collation changes halfway through)?
It is impossible by definition.


No. It is not impossible.
But, yes, there is no definition (standard).

It would be upto the developer or the entity that the developer is 
working in.



I even thing that collation is not part of the string. it does not
change the meaning of the string. It is only used in specific
operations. And then it must be one collation for both strings. So if
each of the string had a collation that would cause an issue.


But, my question is --imho-- a lot more relevant to the thread at hand:

How would you do case-insensitive search in a multilangual text.

[this has nothing to do with rendering or GUI.]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Martin Friebe


listmember wrote:

Actually, UTF-8 can contain bidi info, it's indeed a matter of the
renderer.
And, how do you propose doing a case-insensitive search in a given 
text that contains multiple languages?

I assume you speak of multiply collations in on string?
IMHO You can't? But you could use a TStringList.

I also do not know of other apps that could do this. (And it may not be 
possible). Look around. Databses for example, AFAIK the most you can do 
is define a collation per column.


And how would you sort the following example, with mixed collation. Take 
the various german collations. ae can be used as a substitution for 
a-umlaut.


In some collation it sorts as ae (between ad and af), in others it sorts 
as "a-umlaut" (immediately behind "a")

1)   a, ab, ae
2)   a, ae, ab

How would you sort data where one source is of one collation, the other 
source of another (or even worse the collation changes halfway through)? 
It is impossible by definition.
Because taking the 2 Strings above, each of them can come first when 
sorted depending on the collation, but if more than one collation was 
involved the result was undefined.


I even thing that collation is not part of the string. it does not 
change the meaning of the string. It is only used in specific 
operations. And then it must be one collation for both strings. So if 
each of the string had a collation that would cause an issue.


Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread listmember




Actually, UTF-8 can contain bidi info, it's indeed a matter of the
renderer.


And, how do you propose doing a case-insensitive search in a given text 
that contains multiple languages?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Mattias Gaertner

On Thu, 11 Sep 2008 22:56:49 +0200
Martin Schreiber <[EMAIL PROTECTED]> wrote:

>[...]
> > Doesn't that mean we will be --by design-- unable to write something
> > like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption?

Yes and more. See below.


> > This is why I keep asking that the 'TCharacter' or 'TChar' needs to
> > have a language attribute.
> >
> MSEgui has a richstringty type, a combination of a widestring and a
> dynamic array of formatting info. There are formatting infos for the
> changes only, a richstringty without formatting info has a nil
> pointer for the dynamic array. See lib/common/kernel/mserichstring.pas
> http://sourceforge.net/projects/mseide-msegui/
> 
> "
> type
>  newinfoty = (ni_bold=ord(fs_bold),ni_italic=ord(fs_italic),
>   ni_underline=ord(fs_underline),ni_strikeout=ord(fs_strikeout),
>   ni_selected=ord(fs_selected),
>   //same order as in fontstylety
>  ni_fontcolor,ni_colorbackground,ni_delete);
>  newinfosty = set of newinfoty;
> 
> const
>  fonthandleflags = [ni_bold,ni_italic];
>  fontstyleflags =
> [ni_bold,ni_italic,ni_underline,ni_strikeout,ni_selected];
> 
> type
>  charstylety = record
>   fontcolor,colorbackground: pcolorty;
>   fontstyle: fontstylesty;
>  end;
>  pcharstylety = ^charstylety;
> 
>  charstylearty = array of charstylety;
> 
>  formatinfoty = record
>   index: integer;//0-> from first char
>   newinfos: newinfosty;
>   style: charstylety;
>  end;
> 
>  pformatinfoty = ^formatinfoty;
>  formatinfoarty = array of formatinfoty;
>  pformatinfoarty = ^formatinfoarty;
> 
>  richstringty = record
>   text: msestring;
>   format: formatinfoarty;
>  end;
> "
> 
> It was designed for fast processing in MSEide source code editor.

It is fast, but it misses some Unicode features, like compound
characters.
For example: Mac OS X file system uses compound characters for german
umlaute. MSEide shows the o umlaut as o followed by a box.
Lazarus SynEdit under gtk2 shows it correct, because it uses pango,
which has an almost complete Unicode implementation. But editing is
wrong in SynEdit, because it does not handle compound characters
yet. Gladfully typing an o-umlaut creates a 'normal' single character in
SynEdit. The native gtk2 widgets like TButton and TEdit
handle compound characters correctly.

I wonder how a TCharacter will be defined that supports all Unicode
features. Probably it will be a monster, that only few text editors
want to use.

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:

I've continued to work on support of an unicodestring type in fpc. It's
currently in an svn branch at:
http://svn.freepascal.org/svn/fpc/branches/unicodestring
and will be merged later to trunk. The unicodestring type is a ref.
counted utf-16 string. On non-windows, widestring is mapped to this
type. If you're interested in unicode support please test, give feedback
here and submit fixes.



I have a crash in MSEide startup in a procedure finalization section:
"
#0  77892373 :0 ??()
#1  0082CDF4 :0 U_SYSTEM_ENTRYINFORMATION()
#2  03B7FB2C :0 ??()
#3  03B7FAAC :0 ??()
#4  03C22C1C :0 ??()
#5  0082D9F4 :0 U_SYSTEM_FREELISTS()
#6  03B7F874 :0 ??()
#7  0040F5EB heap.inc:1127 SYSFREEMEM(P=0x0)
#8  778922F8 heap.inc:0 ??()
#9  0082E500 heap.inc:0 U_HEAPTRC_OWNFILE()
#10  00410482 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void)
#11  0040FE94 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, 
LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 812, 
SPINCOUNT = 0})

#12  00414571 heaptrc.pp:666 TRACEFREEMEMSIZE(P=0x6d11b8, SIZE=0)
#13  004146BB heaptrc.pp:722 TRACEFREEMEM(P=0x6d11b8)
#14  0040E404 heap.inc:275 FREEMEM(P=0x6d11b8)
#15  004093FA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x6d11b8)
#16  0040947D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x6d11b8)
#17  004A9B1C msesysintf.pas:306 WINFILEPATH(DIRNAME=0x0, FILENAME=0x6d11b8, 
result=0x3c22fa8)
#18  004AB63F msesysintf.pas:1436 SYS_OPENDIRSTREAM(STREAM={INFOLEVEL = 
FIL_NAME, DIRNAME = 0x3c23148, MASK = 0x3b7faa0, INCLUDE = [FA_ALL], EXCLUDE 
= [], PLATFORMDATA = {0, 208983208, 1, 4294967295, 0, 0, 0, 0}})
#19  004B5BB2 msefileutils.pas:640 SEARCHFILE(AFILENAME=0xc7a07b0, 
ADIRNAME=0x3c22e08, result=0x0)
#20  004B5DED msefileutils.pas:671 SEARCHFILE(AFILENAME=0x6bf2f8, 
ADIRNAMES=0x3c22ed8, highADIRNAMES=0, result=0x0)
#21  004B5F9C msefileutils.pas:698 FINDFILE(FILENAME=0x6bf2f8, 
DIRNAMES=0x3c22ed8, PATH=0x0, highDIRNAMES=0)
#22  004C03E1 msestatfile.pas:244 TSTATFILE__READSTAT(STREAM=0x0, 
this=0x3c6b918)
#23  00453CF8 main.pas:1514 TMAINFO__MAINONLOADED(SENDER=0x3c03d40, 
this=0x3c03d40)
#24  0050A717 mseforms.pas:854 
TCUSTOMMSEFORM__DOEVENTLOOPSTART(this=0x3c03d40)
#25  0050A763 mseforms.pas:863 TCUSTOMMSEFORM__RECEIVEEVENT(EVENT=0xc7016f8, 
this=0x3c03d40)

#26  0048CA3A mseevent.pas:213 TOBJECTEVENT__DELIVER(this=0xc7016f8)
#27  0042E7D0 msegui.pas:12666 
TINTERNALAPPLICATION__EVENTLOOP(AMODALWINDOW=0x0, ONCE=false, this=0x3bd9460)
#28  0042F52C msegui.pas:13063 TINTERNALAPPLICATION__DOEVENTLOOP(ONCE=false, 
this=0x3bd9460)

#29  0048B3F8 mseapplication.pas:1132 TCUSTOMAPPLICATION__RUN(this=0x3bd9460)
#30  004025D1 mseide.pas:59 main()
"
I could not find a simple program to demonstrate the failure. Something 
strange is that the following procedure calls fpc_WideStr_Decr_Ref in 
finalization section:

"
const
 quotechar = unicodechar('"');

procedure requote(var path: unicodestring; const newvalue: unicodestring);
begin
 if punicodechar(path)^ = quotechar then begin
  path:= quotechar + newvalue;
 end
 else begin
  path:= newvalue;
 end;
end;
"
I saw that you merged unicodestring to trunk. Should I test with trunk instead 
of unicodestring branch?


Yes. Unicodestring branch is closed.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Martin Schreiber

On Thursday 11 September 2008 22.33:32 listmember wrote:
>  >> procedure TLabel.Paint(...)
>  >> begin
>  >>   if *Caption.IsRTL *then
>  >> DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
>  >> else
>  >> DrawCaption(0,0,*Caption.AsUTF8*, flags);
>  >> end;
>  >>
>  >> Is not that enough?
>  >
>  > What is the gain as opposed to
>  >
>  >   procedure TLabel.Paint(...)
>  >   begin
>  >if IsRTL(Caption) then
>  >  DrawCaptionRTL(0,0,AsUTF8(Caption), flags)
>  >   else
>  >  DrawCaption(0,0,AsUTF8(Caption), flags);
>  >   end;
>  >
>  > In other words where is the benefit from OOP in this ?
>
> IMO, both are deficient as they both assume that a string block (text)
> is either RTL or LTR.
>
> Doesn't that mean we will be --by design-- unable to write something
> like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption?
>
> This is why I keep asking that the 'TCharacter' or 'TChar' needs to have
> a language attribute.
>
MSEgui has a richstringty type, a combination of a widestring and a dynamic 
array of formatting info. There are formatting infos for the changes only, a 
richstringty without formatting info has a nil pointer for the dynamic array. 
See lib/common/kernel/mserichstring.pas
http://sourceforge.net/projects/mseide-msegui/

"
type
 newinfoty = (ni_bold=ord(fs_bold),ni_italic=ord(fs_italic),
  ni_underline=ord(fs_underline),ni_strikeout=ord(fs_strikeout),
  ni_selected=ord(fs_selected),
  //same order as in fontstylety
 ni_fontcolor,ni_colorbackground,ni_delete);
 newinfosty = set of newinfoty;

const
 fonthandleflags = [ni_bold,ni_italic];
 fontstyleflags = [ni_bold,ni_italic,ni_underline,ni_strikeout,ni_selected];

type
 charstylety = record
  fontcolor,colorbackground: pcolorty;
  fontstyle: fontstylesty;
 end;
 pcharstylety = ^charstylety;

 charstylearty = array of charstylety;

 formatinfoty = record
  index: integer;//0-> from first char
  newinfos: newinfosty;
  style: charstylety;
 end;

 pformatinfoty = ^formatinfoty;
 formatinfoarty = array of formatinfoty;
 pformatinfoarty = ^formatinfoarty;

 richstringty = record
  text: msestring;
  format: formatinfoarty;
 end;
"

It was designed for fast processing in MSEide source code editor.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Florian Klaempfl


Marco van de Voort schrieb:

In our previous episode, listmember said:

 >   else
 >  DrawCaption(0,0,AsUTF8(Caption), flags);
 >   end;
 >
 > In other words where is the benefit from OOP in this ?

IMO, both are deficient as they both assume that a string block (text) 
is either RTL or LTR.


The assignment only transfers data to the object "TCaption". 
 
Doesn't that mean we will be --by design-- unable to write something 
like 'Yom Kippur (??? ??)' on a caption?


TCaption is responsible for rendering. Including LTR and RTL. Not the
string.


Actually, UTF-8 can contain bidi info, it's indeed a matter of the renderer.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Marco van de Voort

In our previous episode, listmember said:
>  >   else
>  >  DrawCaption(0,0,AsUTF8(Caption), flags);
>  >   end;
>  >
>  > In other words where is the benefit from OOP in this ?
> 
> IMO, both are deficient as they both assume that a string block (text) 
> is either RTL or LTR.

The assignment only transfers data to the object "TCaption". 
 
> Doesn't that mean we will be --by design-- unable to write something 
> like 'Yom Kippur (??? ??)' on a caption?

TCaption is responsible for rendering. Including LTR and RTL. Not the
string.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread listmember



>> procedure TLabel.Paint(...)
>> begin
>>   if *Caption.IsRTL *then
>> DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
>> else
>> DrawCaption(0,0,*Caption.AsUTF8*, flags);
>> end;
>>
>> Is not that enough?
>
> What is the gain as opposed to
>
>   procedure TLabel.Paint(...)
>   begin
>if IsRTL(Caption) then
>  DrawCaptionRTL(0,0,AsUTF8(Caption), flags)
>   else
>  DrawCaption(0,0,AsUTF8(Caption), flags);
>   end;
>
> In other words where is the benefit from OOP in this ?

IMO, both are deficient as they both assume that a string block (text) 
is either RTL or LTR.


Doesn't that mean we will be --by design-- unable to write something 
like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption?


This is why I keep asking that the 'TCharacter' or 'TChar' needs to have 
a language attribute.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Martin Schreiber

On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:
> I've continued to work on support of an unicodestring type in fpc. It's
> currently in an svn branch at:
> http://svn.freepascal.org/svn/fpc/branches/unicodestring
> and will be merged later to trunk. The unicodestring type is a ref.
> counted utf-16 string. On non-windows, widestring is mapped to this
> type. If you're interested in unicode support please test, give feedback
> here and submit fixes.
>

I have a crash in MSEide startup in a procedure finalization section:
"
#0  77892373 :0 ??()
#1  0082CDF4 :0 U_SYSTEM_ENTRYINFORMATION()
#2  03B7FB2C :0 ??()
#3  03B7FAAC :0 ??()
#4  03C22C1C :0 ??()
#5  0082D9F4 :0 U_SYSTEM_FREELISTS()
#6  03B7F874 :0 ??()
#7  0040F5EB heap.inc:1127 SYSFREEMEM(P=0x0)
#8  778922F8 heap.inc:0 ??()
#9  0082E500 heap.inc:0 U_HEAPTRC_OWNFILE()
#10  00410482 systhrd.inc:300 SYSENTERCRITICALSECTION(CS=void)
#11  0040FE94 thread.inc:190 ENTERCRITICALSECTION(CS={DEBUGINFO = 0x0, 
LOCKCOUNT = 1, RECURSIONCOUNT = 0, OWNINGTHREAD = 0, LOCKSEMAPHORE = 812, 
SPINCOUNT = 0})
#12  00414571 heaptrc.pp:666 TRACEFREEMEMSIZE(P=0x6d11b8, SIZE=0)
#13  004146BB heaptrc.pp:722 TRACEFREEMEM(P=0x6d11b8)
#14  0040E404 heap.inc:275 FREEMEM(P=0x6d11b8)
#15  004093FA ustrings.inc:179 DISPOSEUNICODESTRING(S=0x6d11b8)
#16  0040947D ustrings.inc:206 fpc_unicodestr_decr_ref(S=0x6d11b8)
#17  004A9B1C msesysintf.pas:306 WINFILEPATH(DIRNAME=0x0, FILENAME=0x6d11b8, 
result=0x3c22fa8)
#18  004AB63F msesysintf.pas:1436 SYS_OPENDIRSTREAM(STREAM={INFOLEVEL = 
FIL_NAME, DIRNAME = 0x3c23148, MASK = 0x3b7faa0, INCLUDE = [FA_ALL], EXCLUDE 
= [], PLATFORMDATA = {0, 208983208, 1, 4294967295, 0, 0, 0, 0}})
#19  004B5BB2 msefileutils.pas:640 SEARCHFILE(AFILENAME=0xc7a07b0, 
ADIRNAME=0x3c22e08, result=0x0)
#20  004B5DED msefileutils.pas:671 SEARCHFILE(AFILENAME=0x6bf2f8, 
ADIRNAMES=0x3c22ed8, highADIRNAMES=0, result=0x0)
#21  004B5F9C msefileutils.pas:698 FINDFILE(FILENAME=0x6bf2f8, 
DIRNAMES=0x3c22ed8, PATH=0x0, highDIRNAMES=0)
#22  004C03E1 msestatfile.pas:244 TSTATFILE__READSTAT(STREAM=0x0, 
this=0x3c6b918)
#23  00453CF8 main.pas:1514 TMAINFO__MAINONLOADED(SENDER=0x3c03d40, 
this=0x3c03d40)
#24  0050A717 mseforms.pas:854 
TCUSTOMMSEFORM__DOEVENTLOOPSTART(this=0x3c03d40)
#25  0050A763 mseforms.pas:863 TCUSTOMMSEFORM__RECEIVEEVENT(EVENT=0xc7016f8, 
this=0x3c03d40)
#26  0048CA3A mseevent.pas:213 TOBJECTEVENT__DELIVER(this=0xc7016f8)
#27  0042E7D0 msegui.pas:12666 
TINTERNALAPPLICATION__EVENTLOOP(AMODALWINDOW=0x0, ONCE=false, this=0x3bd9460)
#28  0042F52C msegui.pas:13063 TINTERNALAPPLICATION__DOEVENTLOOP(ONCE=false, 
this=0x3bd9460)
#29  0048B3F8 mseapplication.pas:1132 TCUSTOMAPPLICATION__RUN(this=0x3bd9460)
#30  004025D1 mseide.pas:59 main()
"
I could not find a simple program to demonstrate the failure. Something 
strange is that the following procedure calls fpc_WideStr_Decr_Ref in 
finalization section:
"
const
 quotechar = unicodechar('"');

procedure requote(var path: unicodestring; const newvalue: unicodestring);
begin
 if punicodechar(path)^ = quotechar then begin
  path:= quotechar + newvalue;
 end
 else begin
  path:= newvalue;
 end;
end;
"
I saw that you merged unicodestring to trunk. Should I test with trunk instead 
of unicodestring branch?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread ABorka



Some conversion problem occurs and empty string put into a TListbox if I 
try to get a field value with some special characters from a SQL result. 
(using Zeos)


The database field can contain any string with '®' in it for this to happen
for example: 'sometext®'

It seems that

ListBox1.Items.Add(SQL1.FieldByName('MyTableField').AsString);

or even

var s:string;
begin
  s:=SQL1.FieldByName('MyTableField').AsString;
  ListBox1.Items.Add(s);
end;

will only put an empty string into the Listbox.
Somewhere inside FCL, where the Listbox item is inserted there is a 
UTF8Decode which ends up with the empty string because of the '®'  #174 
character it thinks that it is a unicode encoded character and tries to 
get the additional bytes for it which ain't there.


used the Lazarus-0.9.25-16495-fpc-2.2.3-20080909-win32.exe build


Not sure how can this be circumvented (using some conversion function?) 
or if it is a bug.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Martin Friebe


Michael Van Canneyt wrote:

On Thu, 11 Sep 2008, Anton Kavalenka wrote:
  

Florian Klaempfl wrote:


Graeme Geldenhuys schrieb:
  

Remember, Unicode support is much more that simply storing and
displaying text. You have various encodings, RTL or LTR direction etc.
I can't see how a simple type can keep track of all such information
- but then, I don't know the internals of FPC either.  ;-) 


How would an OOP approach solve this? The problem isn't the tracking of
things like encoding or directions but handling all these information. 
  

procedure TLabel.Paint(...)
begin
 if *Caption.IsRTL *then
   DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
else
   DrawCaption(0,0,*Caption.AsUTF8*, flags);
end;

Is not that enough?



What is the gain as opposed to

 procedure TLabel.Paint(...)
 begin
  if IsRTL(Caption) then

In other words where is the benefit from OOP in this ? 

  

1) keeping track of info:
If you can store the info on an object, so you can store it on a record 
(afaik). And a string (even current string) is nothing else but a 
(hidden) record. It already contains length info, and char data.


2) OO style vs functional:
Caption.IsRtl may be seen as syntactical sugar. But as far as I can see, 
it can (almost?) always be translated into functional style. Instead of 
having child-classes you could overload the function for different types 
of arguments


3) For the real usage of OO using inheritance:
I am not sure if that is a good idea, on any kind of *ref-counted* 
data/object. I can see cases where the full OO power can make sense. But 
using OO the objects should not be ref-counted. (IMHO)
Ref-Counting mainly is used to free memory automatically. People relay 
on it, and you get memory leaks.


Strings as they currently stand can not contain pointer to other 
strings. You can not get circular references. ref-counting will work.
Objects on the other hand can contain any data, including pointers to 
other objects or self. Even if the buildin string-objects don't contain 
that kind of pointer, they can be sub-classed and people will end up 
with circular references.


Oh, and yes, I am aware. This risk already exists with dynarrays. But no 
need to extend it.



So in my opinion, it may be nice to have a library of classes handling 
all kind of string(or shall we call it "text") data. But no magic on 
them. They can use PChar and there own GetMem internally.


Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Paul Ishenin


Anton Kavalenka wrote:


How would an OOP approach solve this? The problem isn't the tracking of
things like encoding or directions but handling all these information.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

procedure TLabel.Paint(...)
begin
  if *Caption.IsRTL *then
DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
 else
DrawCaption(0,0,*Caption.AsUTF8*, flags);
end;

Is not that enough?


Sorry, cannot stay aside.
RTL is a control property, not a string. And AsUTF8 is imo unneeded:

procedure TLabel.DrawCaption(ACaption: TUTF8String);
begin
...
end;

procedure TLavel.DrawCaptionRTL(ACaption: TUTF8String);
begin
...
end;

procedure TLabel.Paint(...)
begin
  if RightToLeftAlignment then
DrawCaptionRTL(0, 0, Caption, flags)
  else
DrawCaption(0, 0, Caption, flags)
end;


And Caption can be any desired string type. It will be autoconverted to 
UTF8String if needed. I see no need in a string class - only unneeded 
overhead.


Best regards,
Paul Ishenin.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Michael Van Canneyt



On Thu, 11 Sep 2008, Anton Kavalenka wrote:

> Florian Klaempfl wrote:
> > Graeme Geldenhuys schrieb:
> >   
> > > Remember, Unicode support is much more that simply storing and
> > > displaying text. You have various encodings, RTL or LTR direction etc.
> > > I can't see how a simple type can keep track of all such information
> > > - but then, I don't know the internals of FPC either.  ;-)
> > > 
> >
> > How would an OOP approach solve this? The problem isn't the tracking of
> > things like encoding or directions but handling all these information.
> > ___
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > http://lists.freepascal.org/mailman/listinfo/fpc-devel
> >
> >   
> procedure TLabel.Paint(...)
> begin
>  if *Caption.IsRTL *then
>DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
> else
>DrawCaption(0,0,*Caption.AsUTF8*, flags);
> end;
> 
> Is not that enough?

What is the gain as opposed to

 procedure TLabel.Paint(...)
 begin
  if IsRTL(Caption) then
DrawCaptionRTL(0,0,AsUTF8(Caption), flags)
 else
DrawCaption(0,0,AsUTF8(Caption), flags);
 end;

In other words where is the benefit from OOP in this ? 

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-11 Thread Anton Kavalenka


Florian Klaempfl wrote:

Graeme Geldenhuys schrieb:
  

Remember, Unicode support is much more that simply storing and
displaying text. You have various encodings, RTL or LTR direction etc.
 I can't see how a simple type can keep track of all such information
- but then, I don't know the internals of FPC either.  ;-)



How would an OOP approach solve this? The problem isn't the tracking of
things like encoding or directions but handling all these information.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

procedure TLabel.Paint(...)
begin
 if *Caption.IsRTL *then
   DrawCaptionRTL(0,0,*Caption.AsUTF8*, flags)
else
   DrawCaption(0,0,*Caption.AsUTF8*, flags);
end;

Is not that enough?


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember




But it is far more readable when there is special and reserved type
for which we could have special operators and converters just like
those we have for strings and widestrings.


Oh, I thougbt people just complained in this thread that + isn't
appropriate for strings anyways ...


People are, of course, entitled to their opinions.

And, I -for one-- would never force them against their wills to use the 
'+' operator for any sort of strings.


In the same breath, the fact that some of us object to '+' should not, 
IMO, be the basis to not have 4-byte (or 6-byte) per char strings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Florian Klaempfl


listmember schrieb:

compiler guys all the same} and ask, instead, to give us
reference-counted 4-byte (actually, preferably 6-bytes) per cell
arrays/strings.



What's wrong with an dyn. array of DWord?


Much like what's wrong with dynamic array of Word (as opposed to 
Widestring) or with dynamic array of byte (as opposed to string), really...


Nothing much.

But it is far more readable when there is special and reserved type for 
which we could have special operators and converters just like those we 
have for strings and widestrings.


Oh, I thougbt people just complained in this thread that + isn't 
appropriate for strings anyways ...

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember


compiler guys all the same} and ask, instead, to give us
reference-counted 4-byte (actually, preferably 6-bytes) per cell
arrays/strings.



What's wrong with an dyn. array of DWord?


Much like what's wrong with dynamic array of Word (as opposed to 
Widestring) or with dynamic array of byte (as opposed to string), really...


Nothing much.

But it is far more readable when there is special and reserved type for 
which we could have special operators and converters just like those we 
have for strings and widestrings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Florian Klaempfl

listmember schrieb:
>>> But, I could write a gigantic data mining application, a database
>>> application
>>> or a myriad of such apps that uses the above class without doing a
>>> single
>>> pixel of GUI stuff.
>>
>> I'd like to see that: it will be guaranteed dog slow :(
> 
> Hmm.. may be, maybe not.
> 
> Last year I wrote a natural lang parser (Pascal) and gave the source to
> a Java developer of friend mine.
> 
> It turned out to be faster in Java --classes and all. For some reason,
> using the same algorithm (my code converted to Java, basically), Java
> beat my natively compiled code. And, no there was no GUI involved.

Without detailed code one can say nothing about it. Just I/O being done
wrong can ruin performance.

> 
> compiler guys all the same} and ask, instead, to give us
> reference-counted 4-byte (actually, preferably 6-bytes) per cell
> arrays/strings.
> 

What's wrong with an dyn. array of DWord?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember


But, I could write a gigantic data mining application, a database application
or a myriad of such apps that uses the above class without doing a single
pixel of GUI stuff.


I'd like to see that: it will be guaranteed dog slow :(


Hmm.. may be, maybe not.

Last year I wrote a natural lang parser (Pascal) and gave the source to 
a Java developer of friend mine.


It turned out to be faster in Java --classes and all. For some reason, 
using the same algorithm (my code converted to Java, basically), Java 
beat my natively compiled code. And, no there was no GUI involved.


Basing my arguement upon this world-shattering anectodal evidence, I 
hereby prove my point. So, there :P



However, changing the object pascal language, so it requires the use of
objects whenever you use strings: this is a different story.

And that is what it was all about, after all.


Ooops! I joined too late then.

OK. I retract {I am said to come from a bargaining culture though I have 
yet to hone my skills with a carpet dealer, but I'll try my luck with 
compiler guys all the same} and ask, instead, to give us 
reference-counted 4-byte (actually, preferably 6-bytes) per cell 
arrays/strings.


If I can have such a beast, it will be fast enough and will also cover 
almost all of the foreseable problems.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Michael Van Canneyt

On Wed, 10 Sep 2008, listmember wrote:

> Michael Van Canneyt wrote:
> > You are mixing 2 things. There is the actual string content, and there is
> > the
> > string metadata. The metadata is something that would apply for flyweight
> > pattern. There is nothing to be gained by putting the metadata in an object,
> 
> This is true --upto a point.
> 
> And, that point arises when you wish to be able to work further with a
> TCharacter.
> 
> Say, you're doing text processing --display and all. You would definitely like
> to be able to derive a new class from TCharacter and call it, say,
> TWPCharacter which contains all sorts of other properties, color, style, font,
> size etc.
> 
> This would make life immensely easier for such jobs whereby a character may
> need to have more attributes than there exists in the base class.
> 
> > since there is only the encoding. Storing the encoding in an object is
> > ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
> > than a 4 or 8 byte object pointer.
> 
> I am afraid I do not agree with this at all. Or rather, it comes accross a
> very ANSI-centric view.

You are mixing 2 things:

- Texts (strings) at the compiler language level.
- (complex) GUI design that needs to handle a lot of text and a lot of extra
  properties.

For GUI design, you may well need all the things you describe. 
And as I said before: you can do this yourself if you need it.

But at the _language level_, there is no need for all these things. 
They make simple usage of the language impossible. To burden the 
pascal language with all these things would be a serious mistake.

But there is nothing that stops people from doing all these things 
for themselves if they require it.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Michael Van Canneyt

On Wed, 10 Sep 2008, listmember wrote:

> Michael Van Canneyt wrote:
> > You are mixing 2 things:
> >
> > - Texts (strings) at the compiler language level.
> > - (complex) GUI design that needs to handle a lot of text and a lot of extra
> >properties.
> 
> :)
> 
> If you draw the lines so red and thick, who am I to disagree...
> 
> But, I could write a gigantic data mining application, a database application
> or a myriad of such apps that uses the above class without doing a single
> pixel of GUI stuff.

I'd like to see that: it will be guaranteed dog slow :(

But that is not the point.

> > For GUI design, you may well need all the things you describe.
> > And as I said before: you can do this yourself if you need it.
> 
> True.
> 
> I could also do my own TList, TStringList etc. etc. but, luckily I don't have
> to.
> 
> I was under the impression, therefore, that stuff that makes life easier for a
> number of developers get to be included into the main distribution for common
> use; and not be rejected on the basis of /language level/ .

This is another discussion: we could very well decide to include such a
string/character handling class in the RTL or FCL, and you could use it.
I never said we would refuse such a set of classes. 

If certain - generally useful - language features are needed to implement 
such a set of classes, we could even decide to do that. I imagine that auto 
class instance destruction when the variable goes out of scope, is one of them. 
I have proposed something like it years ago, because it is broader in scope.

However, changing the object pascal language, so it requires the use of 
objects whenever you use strings: this is a different story. 

And that is what it was all about, after all.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Marco van de Voort

In our previous episode, listmember said:
> > - Texts (strings) at the compiler language level.
> > - (complex) GUI design that needs to handle a lot of text and a lot of extra
> >properties.
> 
> If you draw the lines so red and thick, who am I to disagree...
> 
> But, I could write a gigantic data mining application, a database 
> application or a myriad of such apps that uses the above class without 
> doing a single pixel of GUI stuff.

True, and the ability to customize the string type on even the character
level would downright kill performance. Because the compiler can't really
exploit knowledge that it can now (basic 16-bit array on the base level)
 
> > For GUI design, you may well need all the things you describe.
> > And as I said before: you can do this yourself if you need it.
> 
> True.
> 
> I could also do my own TList, TStringList etc. etc. but, luckily I don't 
> have to.

Don't get me started on tstringlist.
 
> I was under the impression, therefore, that stuff that makes life easier 
> for a number of developers get to be included into the main distribution 
> for common use; 

Yes. But most specially it should make life impossible for a group of other
developers. (like the ones that have to process a multi GB database export
regularly)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember


Michael Van Canneyt wrote:

You are mixing 2 things:

- Texts (strings) at the compiler language level.
- (complex) GUI design that needs to handle a lot of text and a lot of extra
   properties.


:)

If you draw the lines so red and thick, who am I to disagree...

But, I could write a gigantic data mining application, a database 
application or a myriad of such apps that uses the above class without 
doing a single pixel of GUI stuff.



For GUI design, you may well need all the things you describe.
And as I said before: you can do this yourself if you need it.


True.

I could also do my own TList, TStringList etc. etc. but, luckily I don't 
have to.


I was under the impression, therefore, that stuff that makes life easier 
for a number of developers get to be included into the main distribution 
for common use; and not be rejected on the basis of /language level/ .

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember





Yes, but most proposals here about a TCharacter are a bit overkill. In
example languare reference for a given char is not very important from
a Unicode point of view, unicode focuses its power in the text, so
locale is important in context operations and collations.


See my other post above.

Locale should really have nothing to do with the text/string business.

Instead, it should only refer to oddities such as decimal number 
representations, thousands separators, date and time strings etc.


Packing the language into the 'locale' info is an abuse IMO, unless it 
refers to such things as what kind of help file it should display to the 
user or the actual strings on menu items (resources) etc.



 From my point of view the compiler basic types must keep being
"basic", so be fast, no more than needed memory eaters and so on.


Please don't get resented, but this kind of attitued is verging on being 
offensive..


Instead of looking at the issue from POV of "I don't need it" or "It 
requires more hardware resources", can't you try to evaluate the need on 
its own merit.


And, if you still think that you will never need it, please remember 
that you dont have to --but others may.



Bring Unicode "power" to the basic string type is overkill, any
Unicode operation will be in the better case double time consumer, and
some of them 40-50 times slower. A simple collation will take at least
4 times the memory needed by the string itself and for most sort
algorithms needs the collation is unnecesary.


So?

What if it is a fact of life?

Such as 24-bit graphics. We all know it takes a lot more resources and 
that only patsies need that much color; we ended up using it.


Cn't you consider this unicode caharacter in the same light? (no pun).


So think in a "new" user
filling a TStringList with 1000 strings and invoking the Sort method,
as the strings are Unicode they must be ordered using the locale
collation or the general collation and finally saying "20 seconds to
sort 1000 strings, this looks even worst than javascript".


No. This is where you are mistaken, I' afraid.

A TUnicodeStringList can contain strings from different collations and 
one 'locale' information will be useless in sorting out that mess. You 
need 'language' information in each of those strings to be able to 
properly sort that unicode list.



Maybe, again from my point of view, it is more logical to create
"TTextUnicodeChar" and "TTextUnicodeString" classes which handle
Unicode textual data, not Unicode data.


I can't see how you can do that. I can't see how we can cater for 
unicode data (not textual data, as you put it) in aything other than a 
specific class [or data type]



PS: As one of the problems of Unicode support is the big amount of
data that must be stored (in exe or external file) is there any
recommended way to code, that unused arrays are left out when the
function that uses that array is never been called in the main program
?


Storage is a completely different problem. You could use, say, UTF-8 
encoding and store also the language information when necessary.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread listmember


Michael Van Canneyt wrote:

You are mixing 2 things. There is the actual string content, and there is the
string metadata. The metadata is something that would apply for flyweight
pattern. There is nothing to be gained by putting the metadata in an object,


This is true --upto a point.

And, that point arises when you wish to be able to work further with a 
TCharacter.


Say, you're doing text processing --display and all. You would 
definitely like to be able to derive a new class from TCharacter and 
call it, say, TWPCharacter which contains all sorts of other properties, 
color, style, font, size etc.


This would make life immensely easier for such jobs whereby a character 
may need to have more attributes than there exists in the base class.



since there is only the encoding. Storing the encoding in an object is
ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
than a 4 or 8 byte object pointer.


I am afraid I do not agree with this at all. Or rather, it comes accross 
a very ANSI-centric view.


You definitely need a 'language' attribute for a character.

'Locale' does not cut it simply because you can have mixed text i.e. 
portions that belong to a different language.


Some weird characters in a my locale (say, Turkish) does not mecessarily 
mean that that piece of string is in another language --it may well be a 
transcription of /my/ name in a different character set (say, Greek).


Yet, we all know that, (upper-, lower, title-) casing has nothing to do 
with the encoding; nor does collation order etc.


In the above example, I used Turkish and Greek {what an unfortunate 
pairing, some might say :) } on purpose:


Both of which follow their own case folding rules, as well as their own 
collation orders which are both dependent upon a language 
attribute/property.


Without a language attribute, how would you handle these sorts of issues?

Using a parallel byte array?

Really?

Wouldn't it be a lot more humane to us developers if the TCharacter had 
properties such as


-- Language
-- CollationOrder
-- UpperCase
-- LowerCase
-- TittlerCase

where, on setting the Language propery, all others get filled with their 
correct values and are read-only.


Cheers,
Adem
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Marco van de Voort

In our previous episode, Graeme Geldenhuys said:
> >  The problem is how it applies to strings, and how they can be more
> >  memory saving than a straight array of 16-bit values which are
> >  copy-on-write.
> 
> I think for a good code example of this, have a look at Java's
> Document class. It's not exactly what I'm talking about, but it's got
> the idea. The Document class forms the basic storing medium of all
> their text based components - from a simple TextEdit, TextArea to
> complex rich text documents. So it scales well.

How can you say that? The limit is if a person notices it, but a main string
type must also be used for serversystems that import a several GB database
export.

> Each character can have individual characteristics set. Storage down
> to character level. Similar to what I am suggesting with the Flyweight
> pattern - characters of a string with encoding information.

I can't see how you could stuff that in less than 16 bits? (since that would
be the storage now, and you said it would save memory) 
 
> The Document class also uses an internal gapped buffer implementation
> to store it's content - apparently good for performance. 

It is one of many ways to avoid big delays on big continuous documents.

E.g. Word uses (classcally) a different approach, where the document is a
set of references to paragraphs. That way you can swap entire paragraphs by
manipulating a few pointers.

It is also totally unrelated to stringhandling.

> Again something like this could be used in the "character pool" manager
> object - though I'm not 100% sure.

Which, what, where, why character pool manager object? How 

> Please note, this is just a thought. I haven't written any Object
> Pascal code implementing something like this - to prove the concept. I
> simply know the Flyweight pattern and it seems to be a possible
> option.

And we are trying to get to the bottom of that feel.

Let me summarize this ENTIRE discussion up to know (this also goes also for
the other posters):

1a) objects -> good   
1b) not object -> not good
2) flyweight pattern will be a good string type.
3) A "+" for string concatenation is frowned upon in good OOP circles.
4) The Java string type is an immutable object.
5) C++ _possibly_ has some problems effiectly coding s[x] using class string
   types.

Which for practical relevance to the unicodestring type can be further
summarized to the empty set.

So in short: while I'm not entirely fond of an OOP approach to strings
(simply because I have never seen one that fits in with a language as
Delphi/FPC), I'm willing to hear the arguments.

But we are now several tens of posts in this subthread, and there has been
absolutely no information at all!

> Remember, Unicode support is much more that simply storing and
> displaying text

Displaying text is already pretty much out of the scope of the unicodestring
type that is the subject of this thread.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Florian Klaempfl

Graeme Geldenhuys schrieb:
> Remember, Unicode support is much more that simply storing and
> displaying text. You have various encodings, RTL or LTR direction etc.
>  I can't see how a simple type can keep track of all such information
> - but then, I don't know the internals of FPC either.  ;-)

How would an OOP approach solve this? The problem isn't the tracking of
things like encoding or directions but handling all these information.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Michael Van Canneyt

On Wed, 10 Sep 2008, Graeme Geldenhuys wrote:

> On 9/10/08, Marco van de Voort <[EMAIL PROTECTED]> wrote:
> >
> > Like everybody, I have read GOF several times, and even got some of the
> >  successor books.
> 
> I don't think anybody has read GOF only once.  :-)
> 
> 
> >  The problem is how it applies to strings, and how they can be more
> >  memory saving than a straight array of 16-bit values which are
> >  copy-on-write.
> 
> I think for a good code example of this, have a look at Java's
> Document class. It's not exactly what I'm talking about, but it's got
> the idea. The Document class forms the basic storing medium of all
> their text based components - from a simple TextEdit, TextArea to
> complex rich text documents. So it scales well.
> 
> Each character can have individual characteristics set. Storage down
> to character level. Similar to what I am suggesting with the Flyweight
> pattern - characters of a string with encoding information.
> 
> The Document class also uses an internal gapped buffer implementation
> to store it's content - apparently good for performance.  Again
> something like this could be used in the "character pool" manager
> object - though I'm not 100% sure.
> 
> 
> Please note, this is just a thought. I haven't written any Object
> Pascal code implementing something like this - to prove the concept. I
> simply know the Flyweight pattern and it seems to be a possible
> option.
> 
> Remember, Unicode support is much more that simply storing and
> displaying text. You have various encodings, RTL or LTR direction etc.
>  I can't see how a simple type can keep track of all such information
> - but then, I don't know the internals of FPC either.  ;-)

You are mixing 2 things. There is the actual string content, and there is the 
string metadata. The metadata is something that would apply for flyweight
pattern. There is nothing to be gained by putting the metadata in an object,
since there is only the encoding. Storing the encoding in an object is 
ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
than a 4 or 8 byte object pointer.

The main problem with the GOF book is that
"If your only tool is a hammer, you tend to think of every problem as a nail."

Objects are not the nec-plus-ultra of programming. They are useful in a
very broad area, but not everything should be done in Objects, because
they do give overhead.

Strings are such a case where objects are simply too cumbersome.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Graeme Geldenhuys

On 9/10/08, Marco van de Voort <[EMAIL PROTECTED]> wrote:
>
> Like everybody, I have read GOF several times, and even got some of the
>  successor books.

I don't think anybody has read GOF only once.  :-)

>  The problem is how it applies to strings, and how they can be more
>  memory saving than a straight array of 16-bit values which are
>  copy-on-write.

I think for a good code example of this, have a look at Java's
Document class. It's not exactly what I'm talking about, but it's got
the idea. The Document class forms the basic storing medium of all
their text based components - from a simple TextEdit, TextArea to
complex rich text documents. So it scales well.

Each character can have individual characteristics set. Storage down
to character level. Similar to what I am suggesting with the Flyweight
pattern - characters of a string with encoding information.

The Document class also uses an internal gapped buffer implementation
to store it's content - apparently good for performance.  Again
something like this could be used in the "character pool" manager
object - though I'm not 100% sure.

Please note, this is just a thought. I haven't written any Object
Pascal code implementing something like this - to prove the concept. I
simply know the Flyweight pattern and it seems to be a possible
option.

Remember, Unicode support is much more that simply storing and
displaying text. You have various encodings, RTL or LTR direction etc.
 I can't see how a simple type can keep track of all such information
- but then, I don't know the internals of FPC either.  ;-)

Regards,
  - Graeme -

___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Marco van de Voort

In our previous episode, Graeme Geldenhuys said:
> > this ever save memory?
> 
> Please read the following...
> 
> http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm
> 
> http://en.wikipedia.org/wiki/Flyweight_pattern
> 
> Design Patterns - Elements of Reusable Object-Oriented Software
>  (aka GOF book)
> "Most contemporary document editors don't use an object for every
> character, presumably for efficiency reasons. Calder demonstrated that
> this approach is feasible in his thesis [Cal93]. Calder's glyphs can
> be shared to reduce storage costs, thereby forming directed-acyclic
> graph structures. We can apply the Flyweight pattern to get the same
> effect." ? A Case Study (chapter)
> 
> [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight
> Objects. PhD thesis, Stanford University, 1993.

Like everybody, I have read GOF several times, and even got some of the
successor books.

The problem is how it applies to strings, and how they can be more
memory saving than a straight array of 16-bit values which are
copy-on-write.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Mattias Gärtner

Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>:

> On 9/10/08, Micha Nelissen <[EMAIL PROTECTED]> wrote:
> > > TCharacter and TString to be more intelligent with what encoding it
> > > represents etc... And if you have an application with many strings, it
> > > might actually save memory, because flyweight objects are used from a
> > > pool.
> > >
> >
> >  Save memory?
> >  1) storing information for each character
> >  2) pool retains old classes I assume, so consumes unused memory; how can
> > this ever save memory?
>
> Please read the following...
>
> http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm
>
> http://en.wikipedia.org/wiki/Flyweight_pattern
>
> Design Patterns - Elements of Reusable Object-Oriented Software
>  (aka GOF book)
> "Most contemporary document editors don't use an object for every
> character, presumably for efficiency reasons. Calder demonstrated that
> this approach is feasible in his thesis [Cal93]. Calder's glyphs can
> be shared to reduce storage costs, thereby forming directed-acyclic
> graph structures. We can apply the Flyweight pattern to get the same
> effect." â A Case Study (chapter)

This is about glyphs, not values of characters.


> [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight
> Objects. PhD thesis, Stanford University, 1993.
>
>
> Also related to your point (2). Reference counted objects can be used.
> So "old" objects get freed automatically.

The reference will need at least a UTF18 sized value, which for speed reason
will probably result in 3 bytes. So for human readable texts the memory will be
comparable to non class based unicode strings. It does not safe memory, but it
does not cost more neither.
But IMO it costs a lot of speed. This is not so important for text editors,
where the glyphs, unicode, rtl, tabs, ... processing takes the biggest part of
the time. For all other string algorithms I need the speed of arrays and base
types.


Mattias

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Graeme Geldenhuys

On 9/10/08, Graeme Geldenhuys <[EMAIL PROTECTED]> wrote:
>
> Please read the following...
>
>  http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm
>
>  http://en.wikipedia.org/wiki/Flyweight_pattern
>
>  Design Patterns - Elements of Reusable Object-Oriented Software
>   (aka GOF book)
>  "Most contemporary document editors don't use an object for every
>  character, presumably for efficiency reasons. Calder demonstrated that
>  this approach is feasible in his thesis [Cal93]. Calder's glyphs can
>  be shared to reduce storage costs, thereby forming directed-acyclic
>  graph structures. We can apply the Flyweight pattern to get the same
>  effect." ― A Case Study (chapter)
>
>  [Cal93] - Paul R. Calder. Building User Interfaces with Lightweight
>  Objects. PhD thesis, Stanford University, 1993.

I forgot to add the reference to the Flyweight Pattern (falls under
Structural Patterns) on page 195 in GOF book.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Graeme Geldenhuys

On 9/10/08, Micha Nelissen <[EMAIL PROTECTED]> wrote:
> > TCharacter and TString to be more intelligent with what encoding it
> > represents etc... And if you have an application with many strings, it
> > might actually save memory, because flyweight objects are used from a
> > pool.
> >
>
>  Save memory?
>  1) storing information for each character
>  2) pool retains old classes I assume, so consumes unused memory; how can
> this ever save memory?

Please read the following...

http://exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm

http://en.wikipedia.org/wiki/Flyweight_pattern

Design Patterns - Elements of Reusable Object-Oriented Software
 (aka GOF book)
"Most contemporary document editors don't use an object for every
character, presumably for efficiency reasons. Calder demonstrated that
this approach is feasible in his thesis [Cal93]. Calder's glyphs can
be shared to reduce storage costs, thereby forming directed-acyclic
graph structures. We can apply the Flyweight pattern to get the same
effect." ― A Case Study (chapter)

[Cal93] - Paul R. Calder. Building User Interfaces with Lightweight
Objects. PhD thesis, Stanford University, 1993.

Also related to your point (2). Reference counted objects can be used.
So "old" objects get freed automatically.

Regards,
  - Graeme -

___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Micha Nelissen


Graeme Geldenhuys wrote:

TCharacter and TString to be more intelligent with what encoding it
represents etc... And if you have an application with many strings, it
might actually save memory, because flyweight objects are used from a
pool.


Save memory?
1) storing information for each character
2) pool retains old classes I assume, so consumes unused memory; how can 
this ever save memory?


Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Anton Kavalenka


I

I fully agree with you. I would like the object oriented way of strings
also - but I stopped asking for that ;) There are a lot of advantages
over the small amount of disadvantages. Of course I dont like this one:

S := TString.Create('');

But a built in class TString that is managed by the compiler.



PS : Maybe i'm a littlebit more up to date about todays concepts of
object oriented languages - maybe because I know him personally
http://en.wikipedia.org/wiki/Bertrand_Meyer
There were a lot of interesting discussions, etc...  altough I dont like
Eiffel :)

and also this guy was one of my profs:
http://en.wikipedia.org/wiki/Niklaus_Wirth

greetings
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

Jet another approach:

var
   s:string;
  intStr:TInternalStringClass absolute s;

TInternalStringClass(s).AsUTF8:='Some string';
writeln('String length=',intStr.length);

TMyStingClass=class(TInternalStringClass)
class function LoadFromResource(nResId:integer)
end;

intStr.LoadFromResourcse(nResId);

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Florian Klaempfl

Marco van de Voort schrieb:
> In our previous episode, Ivo Steinmann said:
>> I fully agree with you. I would like the object oriented way of strings
>> also - but I stopped asking for that ;) There are a lot of advantages
>> over the small amount of disadvantages. Of course I dont like this one:
>>
>> S := TString.Create('');
>>
>> But a built in class TString that is managed by the compiler.
>>
>> PS : Maybe i'm a littlebit more up to date about todays concepts of
>> object oriented languages - maybe because I know him personally
>> http://en.wikipedia.org/wiki/Bertrand_Meyer
>> There were a lot of interesting discussions, etc...  altough I dont like
>> Eiffel :)
> 
> 
> I think it is less the object orientation but the possible customization
> that is interesting. 

Did you ever see anybody using the variants customization :)?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-10 Thread Marco van de Voort

In our previous episode, Ivo Steinmann said:
> I fully agree with you. I would like the object oriented way of strings
> also - but I stopped asking for that ;) There are a lot of advantages
> over the small amount of disadvantages. Of course I dont like this one:
> 
> S := TString.Create('');
> 
> But a built in class TString that is managed by the compiler.
> 
> PS : Maybe i'm a littlebit more up to date about todays concepts of
> object oriented languages - maybe because I know him personally
> http://en.wikipedia.org/wiki/Bertrand_Meyer
> There were a lot of interesting discussions, etc...  altough I dont like
> Eiffel :)


I think it is less the object orientation but the possible customization
that is interesting. But I never liked an existing solution better than the
solution we have now. And while I have not seen all languages, I've seen
more than a few.

Moreover what always strikes me (and apparantly Florian too, judging by his
reaction) is the total lack of (detailed) arguments. All we have till now
are Anton's two lines of pseudo code.

If people are really interested in this, the least you can do is come with
real evidence, comparisons, and not with just a few gratitious soundbites.

If you learned so much from Meyer, write something about it.

See also http://www.freepascal.org/faq.var#extensionselect

Use the wiki for all I care, but do something, and be prepared to find
solutions for criticism.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Graeme Geldenhuys

On Wed, Sep 10, 2008 at 7:15 AM, listmember <[EMAIL PROTECTED]> wrote:
> 1) since each character is a class, memory requirements are increased
> several fold.
>
> 2) Again, the charater-as-class also means that the speed with wich we can
> create and destroy (and manipulate) a string is a lot slower.

I'm not saying that FPC needs to do any of this, I am simply
commenting with my experience in OOP. Both the above issues could be
addressed (or minimised) by using the Flyweight and Proxy design
patterns. See the GOF (Gang of Four) book where they create an rich
text editor. In your example, a TCharacter instance could be shared
between other 'of the same" character in a single string (TString), or
maybe the even have a global pool of TCharacters shared between many
strings. When nobody (TString's) is referencing a TCharacter instance
in the pool, it can be free'd. Just like reference counted objects.

I don't know the internals of FPC, so I can't say if this is more or
less work than the current implementations.  But it does allow
TCharacter and TString to be more intelligent with what encoding it
represents etc... And if you have an application with many strings, it
might actually save memory, because flyweight objects are used from a
pool.

Regards,
 - Graeme -

___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread listmember


Graeme Geldenhuys wrote:

I have to say I agree with you The Object Pascal / Delphi language
already has way to many string types!  At it's just getting worse.

I've always liked the Java style of everything being an object - even
the string type.


The more I look at this Unicode issue, the more I believe we need a 
fundamental object aproach to it.


I mean, before a TString class, we need a TCharacter class in which we 
need to specify --amongst other things-- what language that character 
belongs to.


This kind of information is needed in order to properly manage the 
(upper-, lower-, title-, and camel-?) casing issues.


On top of this, we also need this information in order to be able to mix 
and match and display the LTR (left-to-right) and RTL (right-to-left) 
pieces of strings within the same string.


I have done some work on this, but there are at least 2 issues:

1) since each character is a class, memory requirements are increased 
several fold.


2) Again, the charater-as-class also means that the speed with wich we 
can create and destroy (and manipulate) a string is a lot slower.


I am, at this point, wondering if FPC's object creation/destroy code 
could be more optimized to be faster to help with this issue.


3) How do you handle the character sets when characters are objects?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Thaddy


peter green schreef:


I have just checked the manual and I don't see anything I can use to 
make sure my custom type starts at a predictable state initially 
(nessacery so they assignment operator can safely clean up before 
making the assignment). Nor do I see anything to do automatic clean up 
when the variable goes out of scope.

That's the point
You don't have to! With the java system the string type is immutable 
anyway so there is no point in doing a deep copy.


Which is imo - in the case of Java, but especially in the case of c++ - 
proven to be no at very smart idea. You want both and you want them 
recognizable by the compiler

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Florian Klaempfl


Jonas Maebe schrieb:


On 09 Sep 2008, at 21:37, Florian Klaempfl wrote:

Even C++'s is not good enough to do a ref. counted string in an 
efficient way. Just consider the [...] operator which needs to 
distinguish between reads and writes to avoid unncessary unique calls.


Can't you have a const and non-const version of the [] operator in C++?


I tried something similiar once with an older gcc but I didn't get it 
working. Maybe it's possible with newer ones.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread peter green




Check again...
I have just checked the manual and I don't see anything I can use to 
make sure my custom type starts at a predictable state initially 
(nessacery so they assignment operator can safely clean up before making 
the assignment). Nor do I see anything to do automatic clean up when the 
variable goes out of scope.


But it is still a bad idea (like c++) How does one recognize a deep vs 
shallow string copy f.e. 
You don't have to! With the java system the string type is immutable 
anyway so there is no point in doing a deep copy. With the delphi/fpc 
system the string type automatically makes a shallow copy initially and 
then copies the actual data if and when it becomes nessacery to do so.




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Jonas Maebe



On 09 Sep 2008, at 21:37, Florian Klaempfl wrote:

Even C++'s is not good enough to do a ref. counted string in an  
efficient way. Just consider the [...] operator which needs to  
distinguish between reads and writes to avoid unncessary unique calls.


Can't you have a const and non-const version of the [] operator in C++?


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Sunday 07 September 2008 21.23:24 Florian Klaempfl wrote:

Trunk 11723 does not compile:

Trunk or unicodestring branch? Strange, because here it works?


Unicodestring branch, sorry, I should change the directory name of my switched 
checkout. Does your unicodestring branch compile?




Fixed in rev. 11734
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Florian Klaempfl


peter green schrieb:
3: Use an automatic reference counting system either implemented in the 
compiler (the delphi/fpc way) or implemented using a very powerfull 
operator overloading system (the C++ way, last I checked freepascal did 
not have sufficiant operator overloading capabilities to implement this)


Even C++'s is not good enough to do a ref. counted string in an 
efficient way. Just consider the [...] operator which needs to 
distinguish between reads and writes to avoid unncessary unique calls.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Thaddy


peter green schreef:



I fully agree with you. I would like the object oriented way of strings
also - but I stopped asking for that ;) There are a lot of advantages
over the small amount of disadvantages.

Which object orientated way of doing strings?

As I see it there are three main ways of doing variable length strings.

1: Let the programmer manage the memory lifetime (the C way), this is 
tedious, error prone and generally results in lots of unnessacery 
copying of strings since it is easier for the programmer to have 
seperate copies owned by different objects than to

manage shared strings.
2: Use immutable objects and let the garbage collector clean them up 
(the java way), this works but since the strings are immutable they 
must be copied to make any modification. It also relies on a garbage 
collector will all it's associated problems.
3: Use an automatic reference counting system either implemented in 
the compiler (the delphi/fpc way) or implemented using a very 
powerfull operator overloading system (the C++ way, last I checked 
freepascal did not have sufficiant operator overloading capabilities 
to implement this)



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.169 / Virus Database: 270.6.19/1660 - Release Date: 9/8/2008 6:39 PM


  

Check again...
But it is still a bad idea (like c++) How does one recognize a deep vs 
shallow string copy f.e.  This is realy basic. And rather uninformed 
as well..


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread peter green




I fully agree with you. I would like the object oriented way of strings
also - but I stopped asking for that ;) There are a lot of advantages
over the small amount of disadvantages.

Which object orientated way of doing strings?

As I see it there are three main ways of doing variable length strings.

1: Let the programmer manage the memory lifetime (the C way), this is 
tedious, error prone and generally results in lots of unnessacery 
copying of strings since it is easier for the programmer to have 
seperate copies owned by different objects than to

manage shared strings.
2: Use immutable objects and let the garbage collector clean them up 
(the java way), this works but since the strings are immutable they must 
be copied to make any modification. It also relies on a garbage 
collector will all it's associated problems.
3: Use an automatic reference counting system either implemented in the 
compiler (the delphi/fpc way) or implemented using a very powerfull 
operator overloading system (the C++ way, last I checked freepascal did 
not have sufficiant operator overloading capabilities to implement this)



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Florian Klaempfl


Ivo Steinmann schrieb:

I fully agree with you. I would like the object oriented way of strings
also - but I stopped asking for that ;) There are a lot of advantages


Which ones? Really, I want to know :)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Ivo Steinmann

Anton Kavalenka schrieb:
> Florian Klaempfl wrote:
>> I've continued to work on support of an unicodestring type in fpc.
>> It's currently in an svn branch at:
>> http://svn.freepascal.org/svn/fpc/branches/unicodestring
>> and will be merged later to trunk. The unicodestring type is a ref.
>> counted utf-16 string. On non-windows, widestring is mapped to this
>> type. If you're interested in unicode support please test, give
>> feedback here and submit fixes.
>>
>> An existing working copy of trunk can be switched to this branch by
>> cd fpc
>> svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring
>> and back with
>> svn switch http://svn.freepascal.org/svn/fpc/trunk
>> ___
>> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
>> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>>
> The Pascal huge strings always annoy me.
> Since - it is IMPLICIT automatic object with set of overloaded
> methods, length and reference count fields etc hidden from developer.
>
> In near future we geat a Zoo of the strings:
> AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
> Some of them with encoding field.
>
> Why not to make it EXPLICIT object
>
> s:=TCoolFPCString.Create('Test');
> s2:=TCoolFPCString.Create('Проверка'); //UTF8 encoded constant
> s.asUtf8+=s2;
>
> SetWindowTextW(WinHandle,s.AsUnicodeString); // i explicitly say - get
> me wide string and DO not any compiler magic
>
> if (s1.length=length(s2))... // generic runtime function length
> returns the property of cool object
>
> s1.AcquireLock // prevent other threads acccess
> s1.Clear;
> s1.LoadFromResource(n_ReasourceId); // just use GNU gettext
> s1.LoadTranslationFromResource(n_resID,'be_BY');
> s1.ReleaseLock // allow other thread access
>
> Anyway I just can subclass standard string and get a new functionality
> with reachness of availabel fields and methods.
>
>
> FPC supports operators - so there is lots of way to represent the
> string, assign the string, load it from resource.
> Make it thread-safe at implementation level but not at compiler level.
> Standard string, unicode string , ansistring, widestring can be
> implemented as wrappers along this object.
> It seems like in mseGUI it is done.
>
>
>
> 
>
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>   
I fully agree with you. I would like the object oriented way of strings
also - but I stopped asking for that ;) There are a lot of advantages
over the small amount of disadvantages. Of course I dont like this one:

S := TString.Create('');

But a built in class TString that is managed by the compiler.



PS : Maybe i'm a littlebit more up to date about todays concepts of
object oriented languages - maybe because I know him personally
http://en.wikipedia.org/wiki/Bertrand_Meyer
There were a lot of interesting discussions, etc...  altough I dont like
Eiffel :)

and also this guy was one of my profs:
http://en.wikipedia.org/wiki/Niklaus_Wirth

greetings
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Joao Morais


Anton Kavalenka wrote:
I only have a dream - controllable way of string assignment without any 
magic like implicit call of _LStrAddRefCnt


Do you have a real-world sample of usage, ie, where or when the object 
pascal way is a problem?


Joao Morais
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Anton Kavalenka


Michael Van Canneyt wrote:

On Tue, 9 Sep 2008, Anton Kavalenka wrote:

  

Nothing stops you from doing this yourself.

But for something as basic as text operations, I think this is bloat.

Imagine that you would have to do
  I:=TInteger.Create(1);
  J:=TInteger.Create(2);
  I.Add(J);
What kind of language do you end up with then ? Utterly unreadable, and
slow, because heavily relying on the heap.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  
  

Bad example
Numbers are scalars
Strings are vectors
+= operator in not so straightforward as for numbers.



bad example for you, but not for me: Handling strings should be as
easy as handling integers.

  

Who else except Pascal developers knows that s:=s1+s2 is the string
concatenation and invokes lot of hidden stuff that is out of control.



This is the beauty of pascal: you don't need to know, and there should
be no need.

I once asked a C++ programmer how to read a file full of strings.
After 2 hourse he came to tell me he didn't know.

In Pascal, it takes about 1 minute to code, because strings are a 
basic type, handled on the stack. And rightly so.


Michael. 
___

fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

:-)
This not a holy war C++ vs Pascal
If C++ programmer don't know about fstream descendants - send him back 
to school (or actually (he|she) is VB programmer).


I only have a dream - controllable way of string assignment without any 
magic like implicit call of _LStrAddRefCnt





___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread ik

On Tue, Sep 9, 2008 at 2:23 PM, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
> On 9/9/08, Anton Kavalenka <[EMAIL PROTECTED]> wrote:
>>  The Pascal huge strings always annoy me.
>>  Since - it is IMPLICIT automatic object with set of overloaded methods,
>> length and reference count fields etc hidden from developer.
>>
>>  In near future we geat a Zoo of the strings:
>>  AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
>>  Some of them with encoding field.
>
> I have to say I agree with you The Object Pascal / Delphi language
> already has way to many string types!  At it's just getting worse.

Actually I find this to be a good feature. On C for example you will
find a lot of typedef that results out of int or long int, and you can
understand that time_f is about working with time, while pid_t talks
about pids etc... They all are integer types but it is easier to
understand their uses. Sure it means that you must have better
documentation out there, but I think it is worth it.

>
> I've always liked the Java style of everything being an object - even
> the string type.

It is always the thing I dislike in Java.
For example on languages such as Ruby/Python everything is a true
object (including nil in ruby), however you do not "need" it when you
do not use sub methods, and there for your language like Java and C++
become a bloat ware. Because it have way too much information to
compile into binary. On Pascal (using smart linking) you can add only
things you use (but on OO it does not work like that).

>
>
> Regards,
>  - Graeme -
>
>
> ___
> fpGUI - a cross-platform Free Pascal GUI toolkit
> http://opensoft.homeip.net/fpgui/
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>

Ido
-- 
http://ik.homelinux.org/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Michael Van Canneyt

On Tue, 9 Sep 2008, Anton Kavalenka wrote:

> 
> > Nothing stops you from doing this yourself.
> >
> > But for something as basic as text operations, I think this is bloat.
> >
> > Imagine that you would have to do
> >   I:=TInteger.Create(1);
> >   J:=TInteger.Create(2);
> >   I.Add(J);
> > What kind of language do you end up with then ? Utterly unreadable, and
> > slow, because heavily relying on the heap.
> >
> > Michael.
> > ___
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > http://lists.freepascal.org/mailman/listinfo/fpc-devel
> >
> >   
> Bad example
> Numbers are scalars
> Strings are vectors
> += operator in not so straightforward as for numbers.

bad example for you, but not for me: Handling strings should be as
easy as handling integers.

> 
> Who else except Pascal developers knows that s:=s1+s2 is the string
> concatenation and invokes lot of hidden stuff that is out of control.

This is the beauty of pascal: you don't need to know, and there should
be no need.

I once asked a C++ programmer how to read a file full of strings.
After 2 hourse he came to tell me he didn't know.

In Pascal, it takes about 1 minute to code, because strings are a 
basic type, handled on the stack. And rightly so.

Michael. 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Anton Kavalenka




Nothing stops you from doing this yourself.

But for something as basic as text operations, I think this is bloat.

Imagine that you would have to do
  I:=TInteger.Create(1);
  J:=TInteger.Create(2);
  I.Add(J);
What kind of language do you end up with then ? Utterly unreadable, and 
slow, because heavily relying on the heap.


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

Bad example
Numbers are scalars
Strings are vectors
+= operator in not so straightforward as for numbers.

Who else except Pascal developers knows that s:=s1+s2 is the string 
concatenation and invokes lot of hidden stuff that is out of control.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Marco van de Voort

In our previous episode, Graeme Geldenhuys said:

> >  The Pascal huge strings always annoy me. Since - it is IMPLICIT
> >  automatic object with set of overloaded methods,
> >  length and reference count fields etc hidden from developer.
> >
> >  In near future we geat a Zoo of the strings:
> >  AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
> >  Some of them with encoding field.
> 
> I have to say I agree with you The Object Pascal / Delphi language
> already has way to many string types!  At it's just getting worse.

Well, then only use one? What is the problem? As soon as the RTL is
unicodestring enabled, throw away anything that is not unicode, create
everything new in unicode, and be done with it.

Legacy always causes ballast.
 
> I've always liked the Java style of everything being an object - even
> the string type.

It creates a lot of troubles (very visible in Java with its need for
stringbuilder), but it is not exactly clear what it solves.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Michael Van Canneyt



On Tue, 9 Sep 2008, Anton Kavalenka wrote:

> Florian Klaempfl wrote:
> > I've continued to work on support of an unicodestring type in fpc. It's
> > currently in an svn branch at:
> > http://svn.freepascal.org/svn/fpc/branches/unicodestring
> > and will be merged later to trunk. The unicodestring type is a ref. counted
> > utf-16 string. On non-windows, widestring is mapped to this type. If you're
> > interested in unicode support please test, give feedback here and submit
> > fixes.
> >
> > An existing working copy of trunk can be switched to this branch by
> > cd fpc
> > svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring
> > and back with
> > svn switch http://svn.freepascal.org/svn/fpc/trunk
> > ___
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > http://lists.freepascal.org/mailman/listinfo/fpc-devel
> >
> The Pascal huge strings always annoy me.
> Since - it is IMPLICIT automatic object with set of overloaded methods, length
> and reference count fields etc hidden from developer.
> 
> In near future we geat a Zoo of the strings:
> AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
> Some of them with encoding field.
> 
> Why not to make it EXPLICIT object
> 
> s:=TCoolFPCString.Create('Test');

Nothing stops you from doing this yourself.

But for something as basic as text operations, I think this is bloat.

Imagine that you would have to do
  I:=TInteger.Create(1);
  J:=TInteger.Create(2);
  I.Add(J);
What kind of language do you end up with then ? Utterly unreadable, and 
slow, because heavily relying on the heap.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Graeme Geldenhuys

On 9/9/08, Anton Kavalenka <[EMAIL PROTECTED]> wrote:
>  The Pascal huge strings always annoy me.
>  Since - it is IMPLICIT automatic object with set of overloaded methods,
> length and reference count fields etc hidden from developer.
>
>  In near future we geat a Zoo of the strings:
>  AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
>  Some of them with encoding field.

I have to say I agree with you The Object Pascal / Delphi language
already has way to many string types!  At it's just getting worse.

I've always liked the Java style of everything being an object - even
the string type.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-09 Thread Anton Kavalenka


Florian Klaempfl wrote:
I've continued to work on support of an unicodestring type in fpc. 
It's currently in an svn branch at:

http://svn.freepascal.org/svn/fpc/branches/unicodestring
and will be merged later to trunk. The unicodestring type is a ref. 
counted utf-16 string. On non-windows, widestring is mapped to this 
type. If you're interested in unicode support please test, give 
feedback here and submit fixes.


An existing working copy of trunk can be switched to this branch by
cd fpc
svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring
and back with
svn switch http://svn.freepascal.org/svn/fpc/trunk
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


The Pascal huge strings always annoy me.
Since - it is IMPLICIT automatic object with set of overloaded methods, 
length and reference count fields etc hidden from developer.


In near future we geat a Zoo of the strings:
AnsiString, WideString, UnicodeString, ShortString, PWideChar, PChar
Some of them with encoding field.

Why not to make it EXPLICIT object

s:=TCoolFPCString.Create('Test');
s2:=TCoolFPCString.Create(''); //UTF8 encoded constant
s.asUtf8+=s2;

SetWindowTextW(WinHandle,s.AsUnicodeString); // i explicitly say - get 
me wide string and DO not any compiler magic


if (s1.length=length(s2))... // generic runtime function length returns 
the property of cool object


s1.AcquireLock // prevent other threads acccess
s1.Clear;
s1.LoadFromResource(n_ReasourceId); // just use GNU gettext
s1.LoadTranslationFromResource(n_resID,'be_BY');
s1.ReleaseLock // allow other thread access

Anyway I just can subclass standard string and get a new functionality 
with reachness of availabel fields and methods.



FPC supports operators - so there is lots of way to represent the 
string, assign the string, load it from resource.

Make it thread-safe at implementation level but not at compiler level.
Standard string, unicode string , ansistring, widestring can be 
implemented as wrappers along this object.

It seems like in mseGUI it is done.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-07 Thread Martin Schreiber

On Sunday 07 September 2008 21.23:24 Florian Klaempfl wrote:
> >
> > Trunk 11723 does not compile:
>
> Trunk or unicodestring branch? Strange, because here it works?

Unicodestring branch, sorry, I should change the directory name of my switched 
checkout. Does your unicodestring branch compile?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-07 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Sunday 07 September 2008 10.58:03 Florian Klaempfl wrote:

Martin Schreiber schrieb:

On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote:

Martin Schreiber schrieb:

Next problem is that pmsechar(msestring) returns a NIL pointer if
msestring = ''. As designed? The behaviour of ansistring and widestring
was very useful, I'd like if UnicodeString would behave the same.

Do you have some example code which shows this?

Fixed.


Trunk 11723 does not compile:


Trunk or unicodestring branch? Strange, because here it works?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-07 Thread Martin Schreiber

On Sunday 07 September 2008 10.58:03 Florian Klaempfl wrote:
> Martin Schreiber schrieb:
> > On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote:
> >> Martin Schreiber schrieb:
> >>> Next problem is that pmsechar(msestring) returns a NIL pointer if
> >>> msestring = ''. As designed? The behaviour of ansistring and widestring
> >>> was very useful, I'd like if UnicodeString would behave the same.
> >>
> >> Do you have some example code which shows this?
>
> Fixed.

Trunk 11723 does not compile:
"
make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32'
C:/FPC/2.2.2/bin/i386-Win32/gmkdir.exe -p 
E:/FPC/svn/trunk/rtl/units/i386-win32
C:/FPC/2.2.2/bin/i386-Win32/ppc386.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi..
/win -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg 
system.pp -Fi../win
wustring22.inc(699,27) Fatal: Unknown compilerproc "fpc_char_to_wchar". Check 
if you use the correct run time library.
Fatal: Compilation aborted
make[7]: *** [system.ppu] Fehler 1
make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32'
make[6]: *** [win32_all] Fehler 2
make[6]: Leaving directory `E:/FPC/svn/trunk/rtl'
make[5]: *** [rtl] Fehler 2
make[5]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[4]: *** [next] Fehler 2
make[4]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[3]: *** [ppc1.exe] Fehler 2
make[3]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[2]: *** [cycle] Fehler 2
make[2]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[1]: *** [compiler_cycle] Fehler 2
make[1]: Leaving directory `E:/FPC/svn/trunk'
make: *** [build-stamp.i386-win32] Fehler 2
"
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-07 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote:

Martin Schreiber schrieb:


Next problem is that pmsechar(msestring) returns a NIL pointer if
msestring = ''. As designed? The behaviour of ansistring and widestring
was very useful, I'd like if UnicodeString would behave the same.

Do you have some example code which shows this?


Fixed.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-06 Thread Martin Schreiber

On Saturday 06 September 2008 21.08:50 Florian Klaempfl wrote:
> Martin Schreiber schrieb:
>
> > Next problem is that pmsechar(msestring) returns a NIL pointer if
> > msestring = ''. As designed? The behaviour of ansistring and widestring
> > was very useful, I'd like if UnicodeString would behave the same.
>
> Do you have some example code which shows this?

See attachment.
Test result:
"
F:\proj\testcase\fpc\unicode\punicodechar>punicodechartest.exe
4288048
4288048
0
0
0
An unhandled exception occurred at $004016C5 :
EAccessViolation : Access violation
  $004016C5  main,  line 25 of punicodechartest.pas
"
Martin
program punicodechartest;
{$ifdef FPC}{$mode objfpc}{$h+}{$endif}
{$ifdef mswindows}{$apptype console}{$endif}
uses
 {$ifdef FPC}{$ifdef linux}cthreads,{$endif}{$endif}
 sysutils;
var
 astr: ansistring;
 wstr: widestring;
 ustr: unicodestring;
begin
 astr:= '';
 wstr:= '';
 ustr:= '';
 writeln(ptrint(pansichar(astr)));
 flush(output);
 writeln(ptrint(pwidechar(wstr)));
 flush(output);
 writeln(ptrint(punicodechar(ustr)));
 flush(output);
 writeln(ord(pansichar(astr)^));
 flush(output);
 writeln(ord(pwidechar(wstr)^));
 flush(output);
 writeln(ord(punicodechar(ustr)^));
 flush(output);
end.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-06 Thread Florian Klaempfl


Martin Schreiber schrieb:

On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote:
[...]

This should be fixed.

Thanks, FPC and MSEide compile now.

Attached an "emergency" patch that I could load the MSEgui forms, not finished 
and not tested. 


Thanks.

Is TTypekind = (... tkInterfaceRaw,tkUChar,tkUString) 
correct?


Almost, slightly modified patch is applied.

Next problem is that pmsechar(msestring) returns a NIL pointer if msestring 
= ''. As designed? The behaviour of ansistring and widestring was very 
useful, I'd like if UnicodeString would behave the same.


Do you have some example code which shows this?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-06 Thread Martin Schreiber

On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote:
[...]
> 
> This should be fixed.
> > 
Thanks, FPC and MSEide compile now.

Attached an "emergency" patch that I could load the MSEgui forms, not finished 
and not tested. Is TTypekind = (... tkInterfaceRaw,tkUChar,tkUString) 
correct?
Next problem is that pmsechar(msestring) returns a NIL pointer if msestring 
= ''. As designed? The behaviour of ansistring and widestring was very 
useful, I'd like if UnicodeString would behave the same.

Thanks, Martin
Index: rtl/objpas/classes/classesh.inc
===
--- rtl/objpas/classes/classesh.inc	(revision 11713)
+++ rtl/objpas/classes/classesh.inc	(working copy)
@@ -899,7 +899,8 @@
 
   TValueType = (vaNull, vaList, vaInt8, vaInt16, vaInt32, vaExtended,
 vaString, vaIdent, vaFalse, vaTrue, vaBinary, vaSet, vaLString,
-vaNil, vaCollection, vaSingle, vaCurrency, vaDate, vaWString, vaInt64, vaUTF8String);
+vaNil, vaCollection, vaSingle, vaCurrency, vaDate, vaWString, vaInt64,
+vaUTF8String,vaUString);
 
   TFilerFlag = (ffInherited, ffChildPos, ffInline);
   TFilerFlags = set of TFilerFlag;
@@ -965,6 +966,7 @@
 function ReadStr: String; virtual; abstract;
 function ReadString(StringType: TValueType): String; virtual; abstract;
 function ReadWideString: WideString;virtual;abstract;
+function ReadUnicodeString: UnicodeString;virtual;abstract;
 procedure SkipComponent(SkipComponentInfos: Boolean); virtual; abstract;
 procedure SkipValue; virtual; abstract;
   end;
@@ -1016,6 +1018,7 @@
 function ReadStr: String; override;
 function ReadString(StringType: TValueType): String; override;
 function ReadWideString: WideString;override;
+function ReadUnicodeString: UnicodeString;override;
 procedure SkipComponent(SkipComponentInfos: Boolean); override;
 procedure SkipValue; override;
   end;
@@ -1101,6 +1104,7 @@
 function ReadBoolean: Boolean;
 function ReadChar: Char;
 function ReadWideChar: WideChar;
+function ReadUnicodeChar: UnicodeChar;
 procedure ReadCollection(Collection: TCollection);
 function ReadComponent(Component: TComponent): TComponent;
 procedure ReadComponents(AOwner, AParent: TComponent;
@@ -1119,6 +1123,7 @@
 function ReadRootComponent(ARoot: TComponent): TComponent;
 function ReadString: string;
 function ReadWideString: WideString;
+function ReadUnicodeString: UnicodeString;
 function ReadValue: TValueType;
 procedure CopyValue(Writer: TWriter);
 property Driver: TAbstractObjectReader read FDriver;
@@ -1170,6 +1175,7 @@
 procedure WriteSet(Value: LongInt; SetType: Pointer); virtual; abstract;
 procedure WriteString(const Value: String); virtual; abstract;
 procedure WriteWideString(const Value: WideString);virtual;abstract;
+procedure WriteUnicodeString(const Value: UnicodeString);virtual;abstract;
   end;
 
   { TBinaryObjectWriter }
@@ -1220,6 +1226,7 @@
 procedure WriteSet(Value: LongInt; SetType: Pointer); override;
 procedure WriteString(const Value: String); override;
 procedure WriteWideString(const Value: WideString); override;
+procedure WriteUnicodeString(const Value: UnicodeString); override;
   end;
 
   TTextObjectWriter = class(TAbstractObjectWriter)
Index: rtl/objpas/classes/reader.inc
===
--- rtl/objpas/classes/reader.inc	(revision 11713)
+++ rtl/objpas/classes/reader.inc	(working copy)
@@ -339,6 +339,25 @@
   end;
 end;
 
+function TBinaryObjectReader.ReadUnicodeString: UnicodeString;
+var
+  len: DWord;
+{$IFDEF ENDIAN_BIG}
+  i : integer;
+{$ENDIF}
+begin
+  len := ReadDWord;
+  SetLength(Result, len);
+  if (len > 0) then
+  begin
+Read(Pointer(@Result[1])^, len*2);
+{$IFDEF ENDIAN_BIG}
+for i:=1 to len do
+  Result[i]:=UnicodeChar(SwapEndian(word(Result[i])));
+{$ENDIF}
+  end;
+end;
+
 procedure TBinaryObjectReader.SkipComponent(SkipComponentInfos: Boolean);
 var
   Flags: TFilerFlags;
@@ -749,6 +768,19 @@
 raise EReadError.Create(SInvalidPropertyValue);
 end;
 
+function TReader.ReadUnicodeChar: UnicodeChar;
+
+var
+  U: UnicodeString;
+  
+begin
+  U := ReadUnicodeString;
+  if Length(U) = 1 then
+Result := U[1]
+  else
+raise EReadError.Create(SInvalidPropertyValue);
+end;
+
 procedure TReader.ReadCollection(Collection: TCollection);
 var
   Item: TCollectionItem;
@@ -1172,7 +1204,7 @@
   SetOrdProp(Instance, PropInfo, Ord(ReadBoolean));
 tkChar:
   SetOrdProp(Instance, PropInfo, Ord(ReadChar));
-tkWChar:
+tkWChar,tkUChar:
   SetOrdProp(Instance, PropInfo, Ord(ReadWideChar));  
 tkEnumeration:
   begin
@@ -1217,6 +1249,8 @@
 FOnReadStringProperty(Self,Instance,PropInfo,TmpStr);
   SetStrProp(Instance, PropInfo, TmpStr);
 end;
+tkUstring:
+  SetUnicodeStrProp(Instance,PropInfo,ReadUnicodeString

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-05 Thread Martin Schreiber

On Friday 05 September 2008 22.50:23 Florian Klaempfl wrote:
> > If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has
> > msestring = unicodestring if compiled with -dmse_unicodestring.
>
> What's the official way to compile MSE?
>
cd apps\ide
ppc386.exe -Fu..\..\lib\common\* -Fi..\..\lib\common\kernel 
-Fu..\..\lib\common\kernel\i386-win32 mseide.pas

or open apps\ide\mseide.prj in MSEide with 'Project'-'Open', 'Project'-'Make'.

In order to test UnicodeString the commadline is:
ppc386.exe -dmse_unicodestring -Fu..\..\lib\common\* -Fi..\..\lib\common\kernel 
-Fu..\..\lib\common\kernel\i386-win32 mseide.pas

If you want to debug the compiler with MSEide add the compiler source
directories to 'Project'-'Options'-'Debugger'-'Source directories'.
From an older post of this list:
"
This is for MSEide i386 and FPC 2.3.1:
http://sourceforge.net/project/showfiles.php?group_id=165409

- 'Project'-'New'-'From Program'.
- Select "compiler/pp.pas" from your FPC SVN checkout.
- Accept "pp.prj".
- 'Project'-'Options'-'Make'-'Make options', add "-di386" (without quotes) to 
the first row of 'Command line options'.
- 'Project'-'Options'-'Make'-'Directories', add a row, select "compiler/i386/"
- Add a row with "/compiler/x86/".
- Add a row with "/compiler/systems/".
- Set the commandline parameters for the target in 'Target'-'Environment'.
- Press F9.

You should possibly change the unit output directory to be consistent with the 
make file.
"

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-05 Thread Florian Klaempfl


Martin Schreiber schrieb:

Florian,

On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:

I've continued to work on support of an unicodestring type in fpc. It's
currently in an svn branch at:
http://svn.freepascal.org/svn/fpc/branches/unicodestring
and will be merged later to trunk. The unicodestring type is a ref.
counted utf-16 string. On non-windows, widestring is mapped to this
type. If you're interested in unicode support please test, give feedback
here and submit fixes.


I tried the unicode branch on Windows, rev. 11711 does not compile:

make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32'
E:/FPC/svn/trunk/compiler/ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi../win
 -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg 
system.pp -Fi../win

wstrings.inc(1655,60) Error: Identifier not found "CharLengthPChar"
ustrings.inc(2147,42) Error: Incompatible types: got "procedure(PCha
r,var UnicodeString, LongInt);Register>" expected "procedure(PChar,var WideString, LongInt);Register>"
ustrings.inc(2148,44) Error: Incompatible types: got "function(const
 UnicodeString):UnicodeString;Register>" expected "function(const WideString):WideString;Register>"
ustrings.inc(2149,44) Error: Incompatible types: got "function(const
 UnicodeString):UnicodeString;Register>" expected "function(const WideString):WideString;Register>"
ustrings.inc(2151,46) Error: Incompatible types: got "function(const
 UnicodeString,const UnicodeString):LongInt;Register>" expected "variable type of function(const WideString,const 
WideString):LongInt;Register>"
ustrings.inc(2152,50) Error: Incompatible types: got "function(const
 UnicodeString,const UnicodeString):LongInt;Register>" expected "variable type of function(const WideString,const 
WideString):LongInt;Register>"

system.pp(1253) Fatal: There were 6 errors compiling module, stopping
Fatal: Compilation aborted
make[7]: *** [system.ppu] Fehler 1
make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32'
make[6]: *** [win32_all] Fehler 2
make[6]: Leaving directory `E:/FPC/svn/trunk/rtl'
make[5]: *** [rtl] Fehler 2
make[5]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[4]: *** [next] Fehler 2
make[4]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[3]: *** [ppc2.exe] Fehler 2
make[3]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[2]: *** [cycle] Fehler 2
make[2]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[1]: *** [compiler_cycle] Fehler 2
make[1]: Leaving directory `E:/FPC/svn/trunk'
make: *** [build-stamp.i386-win32] Fehler 2


This should be fixed.


Compiling MSEide with rev. 11667 I get:
Free Pascal Compiler version 2.3.1 [2008/09/05] for i386
Copyright (c) 1993-2008 by Florian Klaempfl
Target OS: Win32 for i386
Compiling mseide.pas
[...]
msestream.pas(762,2) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msestream.pas(785,33) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msestream.pas(810,34) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msesysintf.pas(1552,21) Fatal: Unknown 
compilerproc "fpc_widechararray_to_unicodestr". Check if you use the correct 
run time library.

Fatal: Compilation aborted

If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has msestring = 
unicodestring if compiled with -dmse_unicodestring.


What's the official way to compile MSE?


I found no UnicodeString support in typeinfo and variants?


Indeed, this must be added.

What are the plans for Unicode resourcestrigs? 


Not decided yet.

TField should probably have an 
asUnicodeString property too.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-05 Thread Martin Schreiber

Florian,

On Saturday 30 August 2008 13.37:42 Florian Klaempfl wrote:
> I've continued to work on support of an unicodestring type in fpc. It's
> currently in an svn branch at:
> http://svn.freepascal.org/svn/fpc/branches/unicodestring
> and will be merged later to trunk. The unicodestring type is a ref.
> counted utf-16 string. On non-windows, widestring is mapped to this
> type. If you're interested in unicode support please test, give feedback
> here and submit fixes.
>
I tried the unicode branch on Windows, rev. 11711 does not compile:

make[7]: Entering directory `E:/FPC/svn/trunk/rtl/win32'
E:/FPC/svn/trunk/compiler/ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi../win
 -FE. -FUE:/FPC/svn/trunk/rtl/units/i386-win32 -di386 -dRELEASE -Us -Sg 
system.pp -Fi../win
wstrings.inc(1655,60) Error: Identifier not found "CharLengthPChar"
ustrings.inc(2147,42) Error: Incompatible types: got "" expected ""
ustrings.inc(2148,44) Error: Incompatible types: got "" expected ""
ustrings.inc(2149,44) Error: Incompatible types: got "" expected ""
ustrings.inc(2151,46) Error: Incompatible types: got "" expected ""
ustrings.inc(2152,50) Error: Incompatible types: got "" expected ""
system.pp(1253) Fatal: There were 6 errors compiling module, stopping
Fatal: Compilation aborted
make[7]: *** [system.ppu] Fehler 1
make[7]: Leaving directory `E:/FPC/svn/trunk/rtl/win32'
make[6]: *** [win32_all] Fehler 2
make[6]: Leaving directory `E:/FPC/svn/trunk/rtl'
make[5]: *** [rtl] Fehler 2
make[5]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[4]: *** [next] Fehler 2
make[4]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[3]: *** [ppc2.exe] Fehler 2
make[3]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[2]: *** [cycle] Fehler 2
make[2]: Leaving directory `E:/FPC/svn/trunk/compiler'
make[1]: *** [compiler_cycle] Fehler 2
make[1]: Leaving directory `E:/FPC/svn/trunk'
make: *** [build-stamp.i386-win32] Fehler 2

Compiling MSEide with rev. 11667 I get:
Free Pascal Compiler version 2.3.1 [2008/09/05] for i386
Copyright (c) 1993-2008 by Florian Klaempfl
Target OS: Win32 for i386
Compiling mseide.pas
[...]
msestream.pas(762,2) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msestream.pas(785,33) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msestream.pas(810,34) Warning: Class types "tmsefilestream" 
and "THandleStreamcracker" are not related
msesysintf.pas(1552,21) Fatal: Unknown 
compilerproc "fpc_widechararray_to_unicodestr". Check if you use the correct 
run time library.
Fatal: Compilation aborted

If you want to try it yourself, MSEide+MSEgui trunk rev. 2473 has msestring = 
unicodestring if compiled with -dmse_unicodestring.
I found no UnicodeString support in typeinfo and variants?
What are the plans for Unicode resourcestrigs? TField should probably have an 
asUnicodeString property too.

Thank you very much for your work.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-01 Thread Marco van de Voort

In our previous episode, Marc Weustink said:
> OK, then we name it objects (or records with methods)
> 
> > Before you know it you are messing with special stringbuilder classes and
> > special syntax to keep a semblance of performance. Moreover I don't really
> > see what this solves.
> 
> It solves the case that you want to have records/objects with non 
> standard inatialisation/finalisation code. Refcounted assignments etc.

Yes, it allows more leeway for DIY. But I let the fact the common situation
gets messier prevail over that.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-01 Thread Marc Weustink


Marco van de Voort wrote:

In our previous episode, Ivo Steinmann said:

fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  
Why not creating a new kind of managed class, that is refcounted, 
initialized, finalized, etc... like String type?


I never liked string-types as classes. They feel like cheap imitations of a
real string type.


OK, then we name it objects (or records with methods)


Before you know it you are messing with special stringbuilder classes and
special syntax to keep a semblance of performance. Moreover I don't really
see what this solves.


It solves the case that you want to have records/objects with non 
standard inatialisation/finalisation code. Refcounted assignments etc.


If this were possible, I would have used in for the utf enconding of 
strings and for the wincontrol.handle/reference in lazarus.


Marc

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-01 Thread Marco van de Voort

In our previous episode, Ivo Steinmann said:
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > http://lists.freepascal.org/mailman/listinfo/fpc-devel
> >
> >   
> Why not creating a new kind of managed class, that is refcounted, 
> initialized, finalized, etc... like String type?

I never liked string-types as classes. They feel like cheap imitations of a
real string type.

Before you know it you are messing with special stringbuilder classes and
special syntax to keep a semblance of performance. Moreover I don't really
see what this solves.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-01 Thread Ivo Steinmann


Marco van de Voort schrieb:

In our previous episode, Luiz Americo Pereira Camara said:
  

And use TNativeString for encoding agnostic purposes.


Well, really agnostic code should simply use "string" :)
  
Delphi is introducing  the RawByteString type, that skips the auto 
encoding conversion. I don't know where it fits in the upcoming unicode 
schema. Anyway there's an example how to use it: 
http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/



That's the lowlevel agnostic way. I'm talking more for purposes like classes
libraries, that will want to use a native type on both conventions, but will
generally operate on the strings on a relatively high level.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  
Why not creating a new kind of managed class, that is refcounted, 
initialized, finalized, etc... like String type? then create a string 
type on this managed class. So String is going to be a class. For for 
unicode string you create a descendant of this class with unicode 
implementations. This way it's still compatible to the baseclass String.


The managed class should follow these rules:



 EXAMPLE ---

function Param(S: TManagedClass);
function VarParam(var S: TManagedClass);
function OutParam(out S: TManagedClass);
function ConstParam(const S: TManagedClass);

var
 S, T: TManagedClass;

begin
{c} S := nil; // compiler code
{c} T := nil;

S := 'abcd';
{c} tmp := TManagedClass.Create('abcd');
{c} S := S.Assign(tmp); // Assign is a function that takes another 
managed class to assign and returns a new instance or reference. (self 
can be nil)

{c} tmp.Release;

Param(S);
{c} S.NewRef;  // NewRef is a function that incrase the
{c} Param(S);
{c} S.Release;

VarParam(S);
{c} VarParam(S);

OutParam(S);
{c} S.Release;
{c} S := nil;
{c} OutParam(S);

ConstParam(S);
{c} ConstParam(S);

T := S;
{c} T := T.Assign(S);

T := S + 'abcd' + S;
{c} eg. TManagedClass.Append function

{c} S.Release;
{c} T.Release;
end;



END -

I'm aware that a lot of these things could also implemented as 
overloaded operators. But NOT the initialization, finalization and 
parameter handling part.


Also Assign maybe a problem, because I need somehting like that:

operator := (var Dest: TManagedClass; Src: TManagedClass);

In case of assign, the destination pointer changes. but before I can 
change the destination pointer, I have to release the reference. and 
that's not possible with:


operator := (Src: TManagedClass) Dest: TManagedClass;



-Ivo
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-09-01 Thread Marco van de Voort

In our previous episode, Luiz Americo Pereira Camara said:
> >>
> >> And use TNativeString for encoding agnostic purposes.
> >
> > Well, really agnostic code should simply use "string" :)
> 
> Delphi is introducing  the RawByteString type, that skips the auto 
> encoding conversion. I don't know where it fits in the upcoming unicode 
> schema. Anyway there's an example how to use it: 
> http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/

That's the lowlevel agnostic way. I'm talking more for purposes like classes
libraries, that will want to use a native type on both conventions, but will
generally operate on the strings on a relatively high level.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-31 Thread Ivo Steinmann


Hello ;)

Im trying to test your new string type now :) but after switching to new 
branch, I couldnt compile fpc ^^



BTW:
A year ago, I wrote a complete Unicode string library for a new possible 
built in string type. My Idea was to create a new built in type with an 
additional flag. This flag stored the encoding of the string. By default 
(if the flag is not explicitly set by the user) it was the current 
system charmap.


I also wrote functions that could encode from any to any charmap (utf8, 
utf16, ucs2, ucs4, iso8859, codepages, ascii, etc). if the user 
concated two strings, one encoded in ascii, and one in utf8, the 
resulting string was utf8;


S1: Unistring;
S2: Unistring;
S3: Unistring;

S1 := 'hello' as ascii;
S2 := 'foobar' as utf8;
S3 := S1 + S2;

S3 was UTF8

+ the string can hold any kind of charmap and the string manager is 
aware of that

- additional flag required
+ allways the optimal encoding is used
+ the dont have to care about encoding (except if he read from sources 
with different encodings, like textfiles)

- maybe some extra encode/decode work required

-Ivo Steinmann


Florian Klaempfl schrieb:
I've continued to work on support of an unicodestring type in fpc. 
It's currently in an svn branch at:

http://svn.freepascal.org/svn/fpc/branches/unicodestring
and will be merged later to trunk. The unicodestring type is a ref. 
counted utf-16 string. On non-windows, widestring is mapped to this 
type. If you're interested in unicode support please test, give 
feedback here and submit fixes.


An existing working copy of trunk can be switched to this branch by
cd fpc
svn switch http://svn.freepascal.org/svn/fpc/branches/unicodestring
and back with
svn switch http://svn.freepascal.org/svn/fpc/trunk
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-31 Thread Luiz Americo Pereira Camara


Daniël Mantione wrote:

Op Sat, 30 Aug 2008, schreef Marco van de Voort:


So then you can (hopefully) pretty much do

{$ifdef unix} // in reality it is more complicated than ifdef unix, 
but for

now..
TNativeString = type ansistring (CP_UTF8);
{$else}
TNativeString = type TUnicodeString;
{$endif}

And use TNativeString for encoding agnostic purposes.


Well, really agnostic code should simply use "string" :)


Delphi is introducing  the RawByteString type, that skips the auto 
encoding conversion. I don't know where it fits in the upcoming unicode 
schema. Anyway there's an example how to use it: 
http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/


Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-30 Thread Florian Klaempfl


Graeme Geldenhuys schrieb:

On Sat, Aug 30, 2008 at 4:07 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:

I don't know what is "core", so would you mind forwarding the related
messages to this group?

Not really, we had enough useless and time wasting discussions about this.


Still doesn't answer my question as to what "core" is... Is that a
different mailing list to fpc-devel?  


Yes, invite only mailing list for active developers.


If so, is there a archive I can
search?  


No :)


I would just like to know the arguments (background
information) for both utf-[8,16]


utf-8 will be supported by an extended ansistring which will support 
different encodings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-30 Thread Graeme Geldenhuys

On Sat, Aug 30, 2008 at 4:07 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>>
>> I don't know what is "core", so would you mind forwarding the related
>> messages to this group?
>
> Not really, we had enough useless and time wasting discussions about this.

Still doesn't answer my question as to what "core" is... Is that a
different mailing list to fpc-devel?  If so, is there a archive I can
search?  I would just like to know the arguments (background
information) for both utf-[8,16]

Regards,
 - Graeme -

___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-30 Thread Florian Klaempfl


JoshyFun schrieb:

Hello Florian,

Saturday, August 30, 2008, 1:37:42 PM, you wrote:

FK> I've continued to work on support of an unicodestring type in fpc. It's
FK> currently in an svn branch at:
FK> http://svn.freepascal.org/svn/fpc/branches/unicodestring
FK> and will be merged later to trunk. The unicodestring type is a ref.
FK> counted utf-16 string. On non-windows, widestring is mapped to this
FK> type. If you're interested in unicode support please test, give feedback
FK> here and submit fixes.

I'm writting some unicode support functions, they are mostly based in
the current WideString format. Is there any important technical
difference which could prevent the current code to work as the
WideString one ?



This depends on the code and that's why there is this branch :)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicodestring branch, please test and help fixing

2008-08-30 Thread Daniël Mantione




Op Sat, 30 Aug 2008, schreef Marco van de Voort:


In our previous episode, Michael Van Canneyt said:

and back with
svn switch http://svn.freepascal.org/svn/fpc/trunk


What happened to the idea of dynamical encoding ? And why utf-16 ? Unix
uses UTF-8 by default, which means that a conversion must be done each
time you interface to the OS ?


I assume this means Tiburon UTF-8 extension to ansistring follows on this
change.

So then you can (hopefully) pretty much do

{$ifdef unix} // in reality it is more complicated than ifdef unix, but for
now..
TNativeString = type ansistring (CP_UTF8);
{$else}
TNativeString = type TUnicodeString;
{$endif}

And use TNativeString for encoding agnostic purposes.


Well, really agnostic code should simply use "string" :)

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

1 2 >

1 - 100 of 109 matches

Mail list logo