Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Michael Schnell

On 09/18/2011 06:49 PM, DaWorm wrote:

But isn't it O(n^2) only when actually using unicode strings?
Allowing the compiler or library decide _if_ this is a Unicode string 
would require either a dedicated sting types for each encoding or "New 
Strings" with programmable encoding.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Michael Schnell

On 09/18/2011 05:52 PM, Marco van de Voort wrote:


And of course, finally, there is the matter with Delphi compatibility.
This can't even be discussed regarding Unicode programming as long as 
FPC does not have "new Strings".


(AFAIK there even are or have been discussions about not doing new 
strings at all, as they don't completely solve the problems they are 
supposed to solve. This finally would cancel Delphi compatibility - may 
this be a problem or not.  )


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Michael Schnell

On 09/19/2011 11:13 AM, Marco van de Voort wrote:


No. IMHO the point has always been to find a sweet spot. Delphi is not
Visual Basic. Delphi is native and fast.

Isn't this nicely provided by "new Strings" ?

If you are naive and just use them as you have been acquainted to at 
ANSI times, your programs might get slow when using MyString[i];


But if you want to speed things up you can either try to replace the 
MyString[i];by some more decent (and complex) iterator implementation or 
try to set the type of the strings in question (and if necessary convert 
them) to an appropriate ANSI variant.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Hans-Peter Diettrich

Flávio Etrusco schrieb:


IMHO you are seeking the problems in the tools, while the problem is PEBKAC


I partly agree it's PEBKAC, but why make it easy to get wrong when you
can avoid it? Isn't that the point of Pascal?


Many people think that Pascal is an educational (toy) language, and 
inefficient code will proof their opinion :-(


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Marco van de Voort
In our previous episode, Fl?vio Etrusco said:
> > IMHO you are seeking the problems in the tools, while the problem is PEBKAC
> 
> I partly agree it's PEBKAC, but why make it easy to get wrong when you
> can avoid it?

The point is you can't. You only keep the illusion you can marginally longer
at a gigantic price.

> Isn't that the point of Pascal?

No. IMHO the point has always been to find a sweet spot. Delphi is not
Visual Basic. Delphi is native and fast.

> Isn't that the point of AnsiStrings?  Isn't that the point of strong typed
> languages in general?

IMHO you are just dragging in random terms here. Variable encodings were not
forseen in the design, so one can't tell how to deal with it from history.
 
> > I don't like the Java/C# way that you have to manually allocate extra
> > objects (stringbuilders etc) to get(performant) access to the characters 
> > though.
> 
> At least we are not tied by bytecode & VM, so I think we can make better ;-)

Not everything is black/white.

> Is it unthinkable to have the basic/native string type an object?

Probably not. C++ has it, but a rough idea is next to useless. The devil
(and the brilliance) is in the details and borderconditions. See e.g.

http://www.freepascal.org/faq.var#extensionselect

and then specially the second half on how to make a proposal.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Jonas Maebe

On 19 Sep 2011, at 10:27, Flávio Etrusco wrote:

> I partly agree it's PEBKAC, but why make it easy to get wrong when you
> can avoid it? Isn't that the point of Pascal? Isn't that the point of
> AnsiStrings? Isn't that the point of strong typed languages in
> general?

Yes, but supporting unicode processing in a way that the user does not have to 
know about unicode is not possible. Even if everything were UTF-32, you could 
still have characters where the diacritics are separated from the characters 
they belong with (or should the iteration also temporarily normalize the 
string?).

Adding band aids that make plain indexing extremely slow to solve some problems 
and then still requiring people to write different code to get things working 
right in general is not the point of Pascal or strongly typed languages. 
Generally, there is a quick&easy way that is fragile and a somewhat slower and 
more difficult way that is correct. Having something slower that is still 
fragile does not belong in this picture. Especially not since indexing strings 
has always accessed the individual bytes/widechars, so it would also make 
things more confusing for people who have been using Pascal for a long time.

Furthermore it would be Delphi-incompatible (yet more confusion) and it would 
require "char" to become a 32 bit type since otherwise it would not be possible 
to represent every indexed string code point using a char (if any sort of 
consistency is desired, it should be possible to assign a string element to a 
char variable without data loss, since a basic Pascal convention that has held 
forever is that a string is conceptually a packed array of char).


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Flávio Etrusco
On Mon, Sep 19, 2011 at 4:36 AM, Marco van de Voort  wrote:
> In our previous episode, Fl?vio Etrusco said:
>> compatibility feature, and as such should care more about correctness
>> and ease-of-use rather than performance. I thought the endless bugs
>> WRT to char vs codepoint indexes, even in Java-developed software,
>> would buy my argument...
>
> IMHO you are seeking the problems in the tools, while the problem is PEBKAC

I partly agree it's PEBKAC, but why make it easy to get wrong when you
can avoid it? Isn't that the point of Pascal? Isn't that the point of
AnsiStrings? Isn't that the point of strong typed languages in
general?

> The base principle should be to mess with strings as little as possible, in
> that sense you are right.
>
> I don't like the Java/C# way that you have to manually allocate extra
> objects (stringbuilders etc) to get(performant) access to the characters 
> though.

At least we are not tied by bytecode & VM, so I think we can make better ;-)
Is it unthinkable to have the basic/native string type an object?

Best regards,
Flávio
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Jonas Maebe

On 19 Sep 2011, at 09:36, Marco van de Voort wrote:

> I don't like the Java/C# way that you have to manually allocate extra
> objects (stringbuilders etc) to get(performant) access to the characters 
> though.

In Java that's only the case for changing characters. Reading characters 
happens via a simple accessor and since the java.lang.String class is final it 
can be inlined from the start without any problems.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-19 Thread Marco van de Voort
In our previous episode, Fl?vio Etrusco said:
> compatibility feature, and as such should care more about correctness
> and ease-of-use rather than performance. I thought the endless bugs
> WRT to char vs codepoint indexes, even in Java-developed software,
> would buy my argument...

IMHO you are seeking the problems in the tools, while the problem is PEBKAC

The base principle should be to mess with strings as little as possible, in
that sense you are right.

I don't like the Java/C# way that you have to manually allocate extra
objects (stringbuilders etc) to get(performant) access to the characters though.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Flávio Etrusco
On Sun, Sep 18, 2011 at 11:45 AM, Jonas Maebe  wrote:
>
> On 18 Sep 2011, at 13:57, Flávio Etrusco wrote:
>
>> One obvious way to mitigate this would be to store the last
>> CodePoint->Char in the string record, so that at least the most common
>> case is covered.
>
> ... and so that the common case is broken in multithreaded environments.
>
> Directly indexing a string will most likely always work using fixed-length 
> steps (8, 16, 32 bit).
> If you want to iterate based on anything else (such as code points), use some 
> kind of
> iterator model instead.
>
> Jonas

By "the most common case" I meant non-threaded ;-) But no, I don't see
any trivial and efficient solution to avoid the worst case (but among
threadvars, per-string fixed lookup table, shared lookup caches,
per-reference data (like Object), etc, there must be a good solution).
Basically I think the UnicodeString should move farther (than
AnsiString) away from PChar, from the compiler/RTL POV.
I think that the user should (have to) use the iterator model to
*efficiently* iterate over the string, but I see indexed access as a
compatibility feature, and as such should care more about correctness
and ease-of-use rather than performance. I thought the endless bugs
WRT to char vs codepoint indexes, even in Java-developed software,
would buy my argument...

-Flávio
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Hans-Peter Diettrich

DaWorm schrieb:

On Sun, Sep 18, 2011 at 12:01 PM, Sven Barth
 wrote:

On 18.09.2011 17:48, DaWorm wrote:


But isn't it O(n^2) only when actually using unicode strings?


All MBCS encodings, with no fixed character size, suffer from that problem.


Wouldn't you also be able to do something like String.Encoding := Ansi
and then all String[i] accesses would then be o(n) + x (where x is the
overhead of run time checking that it is safe to just use a memory
offset, presumably fairly short)? Of course it would be up to the user
to choose to reencode some string he got from the RTL or FCL that way
and understand the consequences.


Calling subroutines for indexed access, instead of direct array access, 
will add another factor (10..100?) to single character access - 
including register save/restore and disallowed optimizations.



What assumptions are the typical String[i] user going to make about
what is returned?  There will be the types that are seeing if the
fifth character is a 'C' or something like that, and for those there
probably isn't too much that is going to go wrong, they might have to
switch to "C" instead, or the compiler can make the 'C' literal a
"unicode char which is really a string" conversion at compile time.
There may be the ones that want to turn a 'C' into a 'c' by flipping
the 6th bit, and that will indeed break, and in a Unicode world,
perhaps that should break, forcing using LowerCase as needed.


The simple upper/lower conversion works only for ASCII, not for Ansi chars.


 And
there are those (such as myself) who often use strings as buffers for
things like serial comms.  That code will totally break if I were to
try to use a unicode string buffer, but a simple addition of
String.Encoding := ANSI or RawByteString or ShortString in the first
line would fix that, or I could bite the bullet and recode that quick
and dirty code the right way. 


Delphi introduced TBytes for non-character byte data.


My point is that trying to keep the bad
habits of a single byte string world in a unicode world is
counterproductive.  They aren't the same, and all attempts to make
them the same just cause more problems than they solve.


That's why I still suggest to use UTF-16 in user code. When the user 
skips all unknown chars, nothing can go wrong.



As for the RTL and FCL, presumably they wouldn't be doing any of this
Sting[i] stuff in the first place, would they? So they aren't going to
suffer that speed penalty.  Just because one type of code is slow,
doesn't mean everything is slow.


It's absolutely safe, even with UTF-8 strings, to e.g. search for all 
'\' separators, and to replace these in place with '/'. It's also safe 
to search for an set of (ASCII) separator chars, and to split strings at 
these positions (e.g. CSV). Bytewise case-insensitive comparison also 
works for all encodings, at least for equality. Other comparisons are 
much slower, due to the required lookup of the sort order values (maybe 
alphabetic, dictionary etc.), and again with every encoding. Even with 
ASCII there exists a choice of sorting 'a' like 'A', after 'A' or after 'Z'.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread cobines
2011/9/18 Marco van de Voort :
>  The trouble is that it is not that easy, consider the first thing a
> long time pascal user will do is fix his existing code which has many
> constructs that loop over a string:
>
> setlength(s2,s1);
> for i:=1 to length(s1) do
>  s2[i]:=s1[i];
>
> Now, to return codepoint[i], you need to parse all codepoints before [i].

Correct me if I'm wrong, but length(s1) wouldn't return the number of
code points anyway?

--
cobines
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Marco van de Voort
In our previous episode, DaWorm said:
> But isn't it O(n^2) only when actually using unicode strings?
> Wouldn't you also be able to do something like String.Encoding := Ansi
> and then all String[i] accesses would then be o(n) + x (where x is the
> overhead of run time checking that it is safe to just use a memory
> offset, presumably fairly short)? Of course it would be up to the user
> to choose to reencode some string he got from the RTL or FCL that way
> and understand the consequences.

It is possible, but that state can't be in the string/object because for
read-only access strings are shared. (not doing so incurs a lot of copying
overhead)

So that means that you need to allocate that state locally, either
explicitely by manually allocating an iterator object (as Jonas already
explained) or implicitely on the stack. The latter requires a native string
type though, and is therefore hard with objects.

Implicit methods also have the disadvantage that the compiler must recognize
the access pattern. So usually that means only the simplest of cases (or
e.g. only when for..in is used)

> What assumptions are the typical String[i] user going to make about
> what is returned? 

IMHO development should not be driven by the users assumptions.

If so, we would now have UIs with one red button with the text "do what I
think", since that seems to be what most users want and expect :-)

> There will be the types that are seeing if the fifth character is a 'C' or
> something like that, and for those there probably isn't too much that is
> going to go wrong, they might have to switch to "C" instead, or the
> compiler can make the 'C' literal a "unicode char which is really a
> string" conversion at compile time. 

This is a very rare case. With the increasing internationalization of
applications operations on such literals are even rarer.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread DaWorm
On Sun, Sep 18, 2011 at 12:01 PM, Sven Barth
 wrote:
> On 18.09.2011 17:48, DaWorm wrote:

But isn't it O(n^2) only when actually using unicode strings?
Wouldn't you also be able to do something like String.Encoding := Ansi
and then all String[i] accesses would then be o(n) + x (where x is the
overhead of run time checking that it is safe to just use a memory
offset, presumably fairly short)? Of course it would be up to the user
to choose to reencode some string he got from the RTL or FCL that way
and understand the consequences.

What assumptions are the typical String[i] user going to make about
what is returned?  There will be the types that are seeing if the
fifth character is a 'C' or something like that, and for those there
probably isn't too much that is going to go wrong, they might have to
switch to "C" instead, or the compiler can make the 'C' literal a
"unicode char which is really a string" conversion at compile time.
There may be the ones that want to turn a 'C' into a 'c' by flipping
the 6th bit, and that will indeed break, and in a Unicode world,
perhaps that should break, forcing using LowerCase as needed.  And
there are those (such as myself) who often use strings as buffers for
things like serial comms.  That code will totally break if I were to
try to use a unicode string buffer, but a simple addition of
String.Encoding := ANSI or RawByteString or ShortString in the first
line would fix that, or I could bite the bullet and recode that quick
and dirty code the right way.  My point is that trying to keep the bad
habits of a single byte string world in a unicode world is
counterproductive.  They aren't the same, and all attempts to make
them the same just cause more problems than they solve.

As for the RTL and FCL, presumably they wouldn't be doing any of this
Sting[i] stuff in the first place, would they? So they aren't going to
suffer that speed penalty.  Just because one type of code is slow,
doesn't mean everything is slow.

Jeff.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Sven Barth

On 18.09.2011 17:48, DaWorm wrote:


On Sep 18, 2011 5:50 AM, "Marco van de Voort" mailto:mar...@stack.nl>> wrote:
 >
 >  The trouble is that it is not that easy, consider the first thing a
 > long time pascal user will do is fix his existing code which has many
 > constructs that loop over a string:
 >
 > setlength(s2,s1);
 > for i:=1 to length(s1) do
 >  s2[i]:=s1[i];
 >
 > Now, to return codepoint[i], you need to parse all codepoints before [i].
 >
 > So instead of O(n) this loop suddenly becomes O(n^2)

Sure it does.  So what?  The point is, it will do what the user
expects.  And for most users, the fact that it does it slowly won't even
matter.  For those whom it does matter, it is a chance for them to learn
the right way. Like I said in my first post, this is an extremely
complex subject.  I think trying to optimize user code before they even
write it adds even more complexity, which slows implementation down.
Get something that works and gives the expected results first, worry
about speed later.  By the time you finish, the CPU speed will have
caught up to you.


Let me quote a saying by Pascal's father Wirth:

"Software is getting slower more rapidly than hardware becomes faster."
(see also here: http://en.wikipedia.org/wiki/Wirth%27s_law )

I personally see no reason in Pascal becoming (much) slower only because 
we want to support code page aware strings (and O(n^2) IS much slower 
than O(n)).


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Marco van de Voort
In our previous episode, DaWorm said:
> >
> > So instead of O(n) this loop suddenly becomes O(n^2)
> 
> Sure it does.  So what? 

So much!

> The point is, it will do what the user expects.

No it doesn't. The user has no clue, and will just stumble on the next
detail (like codepoints not being characters).

> Like I said in my first post, this is an extremely complex subject.  I think
> trying to optimize user code before they even write it adds even more
> complexity, which slows implementation down. 

As often repeated: IMHO users can make such decisions for their app logic.

The libraries (and most of Lazarus) can't make such speed and Unicode subset
assumptions, and they are heavy "string" users too.

And of course, finally, there is the matter with Delphi compatibility.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread DaWorm
On Sep 18, 2011 5:50 AM, "Marco van de Voort"  wrote:
>
>  The trouble is that it is not that easy, consider the first thing a
> long time pascal user will do is fix his existing code which has many
> constructs that loop over a string:
>
> setlength(s2,s1);
> for i:=1 to length(s1) do
>  s2[i]:=s1[i];
>
> Now, to return codepoint[i], you need to parse all codepoints before [i].
>
> So instead of O(n) this loop suddenly becomes O(n^2)

Sure it does.  So what?  The point is, it will do what the user expects.
And for most users, the fact that it does it slowly won't even matter.  For
those whom it does matter, it is a chance for them to learn the right way.
Like I said in my first post, this is an extremely complex subject.  I think
trying to optimize user code before they even write it adds even more
complexity, which slows implementation down.  Get something that works and
gives the expected results first, worry about speed later.  By the time you
finish, the CPU speed will have caught up to you.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Jonas Maebe

On 18 Sep 2011, at 13:57, Flávio Etrusco wrote:

> One obvious way to mitigate this would be to store the last
> CodePoint->Char in the string record, so that at least the most common
> case is covered.

... and so that the common case is broken in multithreaded environments.

Directly indexing a string will most likely always work using fixed-length 
steps (8, 16, 32 bit). If you want to iterate based on anything else (such as 
code points), use some kind of iterator model instead.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Flávio Etrusco
On Sun, Sep 18, 2011 at 6:50 AM, Marco van de Voort  wrote:
> In our previous episode, Fl?vio Etrusco said:
>>
>> That's somewhat what I was thinking. Actually something like
>>
>>   UnicodeString = object
>>   (...)
> Such ability is not unique for an object. One can also do something like
> that with a native type.
>


Of course. That wasn't meant as a real implementation, I just decided
to write some code instead of explaining in words.
Basically my point was to people discussing endlessly without any data
or observations, that FPC already provides much of the tools for a
non-native implementation to be made and gather real and practical
data.

> It was discussed and rejected.
>  The trouble is that it is not that easy, consider the first thing a
> long time pascal user will do is fix his existing code which has many
> constructs that loop over a string:
>
> setlength(s2,s1);
> for i:=1 to length(s1) do
>  s2[i]:=s1[i];
>
> Now, to return codepoint[i], you need to parse all codepoints before [i].
>
> So instead of O(n) this loop suddenly becomes O(n^2)

I hope then that either I'm wrong or that you change your mind ;-)
IMHO what must be changed is the way to deal with strings.
I must assume from this preoccupation that you're talking about a a
directive to make the String keyword instantiate a UnicodeString?
Also IMVHO in that compiler mode the code just needs to work, not
fast, and the user code be updated/fixed.
One obvious way to mitigate this would be to store the last
CodePoint->Char in the string record, so that at least the most common
case is covered.

Best regards,
Flávio

PS. Sorry for the double post, Marco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Marco van de Voort
In our previous episode, Fl?vio Etrusco said:
> 
> That's somewhat what I was thinking. Actually something like
> 
>   UnicodeString = object
>   strict private
> FEncoding: Integer;
> FBuffer: AnsiString;
> function GetCodePointAt(AIndex: SizeInt): Integer;
> procedure SetCodePoint(AIndex: SizeInt; p_Value: Integer);
>   public
> property CodePoint[AIndex: SizeInt]: Integer read GetCodePointAt
> write SetCodePoint; default;
>   end;
> 
> I just don't whether something like this is already implemented in the
> test branches, at least for -err- testing...

Such ability is not unique for an object. One can also do something like
that with a native type. It was discussed and rejected.

 The trouble is that it is not that easy, consider the first thing a
long time pascal user will do is fix his existing code which has many
constructs that loop over a string:

setlength(s2,s1);
for i:=1 to length(s1) do
  s2[i]:=s1[i];

Now, to return codepoint[i], you need to parse all codepoints before [i].

So instead of O(n) this loop suddenly becomes O(n^2)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-18 Thread Sven Barth

On 18.09.2011 02:22, Flávio Etrusco wrote:

On Sat, Sep 17, 2011 at 10:59 AM, DaWorm  wrote:

This might be total crap, so bear with me a moment,  In an object like
a Stringlist, there is a default property such as Strings, such that
List.Strings[1] is equivalent to List[1], is there not?  If, as in
.NET or Java, all strings become objects, then you could have a String
object whose default property is Chars, whose type isn't really a
char, but another String whose length is one entity.


That's somewhat what I was thinking. Actually something like

   UnicodeString = object
   strict private
 FEncoding: Integer;
 FBuffer: AnsiString;
 function GetCodePointAt(AIndex: SizeInt): Integer;
 procedure SetCodePoint(AIndex: SizeInt; p_Value: Integer);
   public
 property CodePoint[AIndex: SizeInt]: Integer read GetCodePointAt
write SetCodePoint; default;
   end;


I just don't whether something like this is already implemented in the
test branches, at least for -err- testing...


Well... you can now take a look at trunk as well, because the changes 
from cpstrnew have been merged yesterday.


Regards,
Sven

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-17 Thread Flávio Etrusco
On Sat, Sep 17, 2011 at 10:59 AM, DaWorm  wrote:
> This might be total crap, so bear with me a moment,  In an object like
> a Stringlist, there is a default property such as Strings, such that
> List.Strings[1] is equivalent to List[1], is there not?  If, as in
> .NET or Java, all strings become objects, then you could have a String
> object whose default property is Chars, whose type isn't really a
> char, but another String whose length is one entity.

That's somewhat what I was thinking. Actually something like

  UnicodeString = object
  strict private
FEncoding: Integer;
FBuffer: AnsiString;
function GetCodePointAt(AIndex: SizeInt): Integer;
procedure SetCodePoint(AIndex: SizeInt; p_Value: Integer);
  public
property CodePoint[AIndex: SizeInt]: Integer read GetCodePointAt
write SetCodePoint; default;
  end;


I just don't whether something like this is already implemented in the
test branches, at least for -err- testing...

-Flávio
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: RE : [fpc-devel] Unicode support (yet again)

2011-09-17 Thread DaWorm
This might be total crap, so bear with me a moment,  In an object like
a Stringlist, there is a default property such as Strings, such that
List.Strings[1] is equivalent to List[1], is there not?  If, as in
.NET or Java, all strings become objects, then you could have a String
object whose default property is Chars, whose type isn't really a
char, but another String whose length is one entity.  Thus code that
was written as MyString[1] would essentially behave the same as for
old shortstrings (ignoring for a moment the difference between ' and "
in specifying a literal char or string).  This string object could
have a property called Encoding that determined if it was UTF8 or
UTF16 or ANSI or raw bytes, and methods to trigger a direct
conversion, and AsXXX properties to convert for use in routines that
needed a different encoding, the need for which can be determined at
run time.

This may not be even feasible, and it will break some legacy code, but
I think those used to old style strings could get used to it very
quickly, and most of the details would be buried in the RTL where
people who didn't want to see them would not.  Of course, any code
using move or treating a section of shortstring as an open array would
still break, but perhaps that is a good thing.  We aren't running on
8088's any more so dangerous tricks in search of speed shouldn't be
needed as often, if at all.

Those who are looking for super efficiency won't find it here, but I
think months of discussions and thousands of messages pretty much
prove this is a complex task and only a complex solution will solve
it.  Even if the solution is nothing like the above, it is fairly
obvious that any attempt at a simple "one type fits all" solution
won't work, and if you are going to have to have multiple solutions,
to me it is better to bite the bullet and implement a full solution.

Jeff.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel