Re: [fpc-devel] Free Pascal 2.4.2 minimal distros for fpGUI available

2010-11-17 Thread Michael Schnell

On 11/16/2010 02:52 PM, Paul Breneman wrote:



I try hard to write Pascal programs without using pointers and without 
allocating or deallocating memory. 


OOpps. How is this possible :) ?  But in what way does this help on that 
issue ? I do see that memory allocation/deallocation is a source to very 
hard to find bugs, but avoiding it makes may things close to impossible.


-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Boehm garbage collector for freepascal

2010-11-17 Thread Michael Schnell

On 11/15/2010 02:04 PM, Thaddy wrote: ...

If you want to extend a compiler to allow for garbage collection, would 
it not be appropriate to have it manage an additional type of variables  
that is supposed to be "garbage collection enabled" pointers and have it 
always create double indirect accesses to this type of variables. The 
"garbage collection enabled memory allocator" would manage a list of 
these variables and see that the value of the pointer is an entry in 
this list that again is a pointer to the instance. Now that garbage 
collector can do it's work without the user program knowing about this.


Of course there are lots of issues regarding maybe excessive latency 
when the garbage collector is running (and thus the main program is in a 
hold-off) and with threads ,,,


-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said:
> > 
> > I don't consider it an extreme, on the contrary. Trying to fix this is
> > extreme IMHO.
> 
> Sorry, I understood that you want to replace all for loops by iterated 
> loops.

Only the ones where surrogates really matter. 
 
> >>> And in memory.
> >> That's the design flaw, IMO. When UTF-8 strings are re-encoded before 
> >> processing,
> > 
> > How, when? How do you avoid repeated encoding/decoding/encoding cycles?
> 
> The encoding is only changed on demand. 

> Once converted into a computational format (e.g.  UCS2), the internal
> representation deserves no more changes.  Another conversion may occur
> when a copy in a different encoding is requested; since a copy operation
> already is O(n), a conversion will not increase that order.

This sounds awfully like the argument long ago that ansistrings wouldn't
eventually be slower than strings, since the optimizer would optimize nested
inc/decref calls away, so that there would be no more than
one call per string per procedure.  Afaik, I'm still waiting (and realize
now that it is a lot harder since the refcount has to match pretty closely
due to e.g. the possibility of exceptions)

Yes, you can throw it all on future optimization, but I more a one bird in
the hand kind of guy.

> The result were a general Unicode string class, not bound to a specific 
> encoding, similar to the Delphi Unicode string representation.

I see no implementation details here at all that supports such statement. I
see a wish, not a plan. The devil is always in the details.

> > I don't see why a string should be a class. In FPC it is already a first
> > class type.
> 
> Name it as you like, but a first class string type is only a container, 

... of chars.

> that can contain data of any kind - every data type or structure can be 
> seen as a collection of bytes or chars.

I don't see the any kind stuff. Yes, every type that has a variable memory
footprint can store everything in theory, but that is as far as it goes.

> Take Unicode, zip or graphics encodings as examples for data types, that 
> can be stored in such a container, and you'll see that every different 
> encoding deserves different support functions - the data type becomes 
> polymorphic.

1. I don't see zip or graphics encodings
2. Since there are some variable requirements on the type, I don't see a
good reason to scale that up to "all"


> Having reached that point I see a need for a base class, that reflects the
> *nature* of the information, from which classes for specific encodings can
> be derived.  A user will be happy with the abstract type (see TStrings),
> and must not bother with specific encodings, that can result in bad
> behaviour when not choosen properly.

You totally lost me here.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell

On 11/15/2010 01:24 PM, Marco van de Voort wrote:


Typically I'd iterate by means outside the language (I've used simple iterators
based on a record with a few inline methods in the past), and review the
places where you iterate by char through strings, and reduce it
signficantly.

Since the latter is needed for optimal result in nearly every solution.
While I of course do see that in many cases it does make sense to use a 
full blown enumerator to to a loop that traverses an UTF-8 (or other 
Unicode) string in terms of Unicode characters, with a loop variable 
that represents the position of the Unicode character in the string (in 
therms of what ? ), I feel that there are also many cases that only ask 
for a simple loop that just offers a loop variable coded as a 32 bit UCS 
Unicode character or as an UTF-8 string.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell

On 11/17/2010 10:12 AM, Marco van de Voort wrote:


Only the ones where surrogates really matter.
Is is really viable to have the compiler/RTL try to automatically handle 
these ugly beasts, rather than presenting them to the poor user as two 
separate Unicode characters (and only handle the UTC/UTF coding issues 
automatically) ?


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> >
> > Only the ones where surrogates really matter.
> Is is really viable to have the compiler/RTL try to automatically handle 
> these ugly beasts, 

It is not viable not to. Either you implement unicode or not. 

It's an users own choice to not be unicode compliant in his apps (e.g. if he
knows he never goes to the Eastern Asiatic market etc), but a runtime should
be as unicode compliant as reasonably possible.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell

On 11/17/2010 12:02 PM, Marco van de Voort wrote:

In our previous episode, Michael Schnell said:

Only the ones where surrogates really matter.

Is is really viable to have the compiler/RTL try to automatically handle
these ugly beasts,

It is not viable not to. Either you implement unicode or not.

It's an users own choice to not be unicode compliant in his apps (e.g. if he
knows he never goes to the Eastern Asiatic market etc), but a runtime should
be as unicode compliant as reasonably possible.
Regarding that handling surrogate pairs needs tables while UTF/UCS 
handling can be done by simple algorithms and that (AFAIK) surrogate 
pairs are used only in certain environments (Mac and what else ?) I 
think that viability is not an easy decision.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Jonas Maebe


On 17 Nov 2010, at 12:23, Michael Schnell wrote:

Regarding that handling surrogate pairs needs tables while UTF/UCS  
handling can be done by simple algorithms and that (AFAIK) surrogate  
pairs are used only in certain environments (Mac and what else ?)


Surrogate pairs have nothing to do with Mac OS X. Surrogate pairs are  
required when encoding any codepoint in UTF-16 whose UTF32 value is >=  
$1.


You are probably thinking of are decomposed characters (where e.g. "e"  
and "¨" are encoded separately, instead of as "ë"). The RTL will never  
do anything special about them, since they are two regular separate  
codepoints. And then there's of course the fact that more than one  
composed character can map to the same decomposed character, see e.g. http://unicode.org/reports/tr15/#Primary_Exclusion_List_Table 
, and many other issues listed on that page.


In general: if you want to assume that a unicode string is in a  
particular form, convert it to a particular canonical form and operate  
on that (and keep in mind that you may destroy data in the process,  
like with most code page conversions).



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> > It is not viable not to. Either you implement unicode or not.
> >
> > It's an users own choice to not be unicode compliant in his apps (e.g. if he
> > knows he never goes to the Eastern Asiatic market etc), but a runtime should
> > be as unicode compliant as reasonably possible.
> Regarding that handling surrogate pairs needs tables while UTF/UCS 
> handling can be done by simple algorithms 

Surrogate pairs enumeration can be done with simple algorithms. Maybe you
are confusing surrogate pairs with decomposition?

> (AFAIK) surrogate 
> pairs are used only in certain environments (Mac and what else ?) I think
> that viability is not an easy decision.

Anything that is not strictly UCS2 (afaik Win2000 and older, XP is already
UTF16 afaik) has surrogate pairs.

Regarding OS X, iirc I saw a mention somewhere that some components of Mac
OS X prefer decomposed characters. (aka UTF-8Mac). I assume routine all
unicode routines through the Mac libc on OS X makes sure this will work fine
mostly. 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell

On 11/17/2010 01:20 PM, Jonas Maebe wrote:



Surrogate pairs have nothing to do with Mac OS X. Surrogate pairs are 
required when encoding any codepoint in UTF-16 whose UTF32 value is >= 
$1.
In fact I was not aware of the UTF-16 coding scheme. I _supposed_ it 
would work similar as UTF-8 (highest bit set => 32 bit value composed 
from the 31 remaining bits of this and the next word and bit 31 reset) 
and thus could be decoded algorithmically. Seemingly I was wrong and a 
huge table is needed to decode UTF16.


But in fact we have just been discussing UTF-8. If  here, surrogates 
are  unnecessary, and thus using them seems "funny" to me.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Free Pascal 2.4.2 minimal distros for fpGUI available

2010-11-17 Thread Paul Breneman

Martin Schreiber wrote:

On Tuesday, 16. November 2010 14.52:12 Paul Breneman wrote:

I'd like to take the minimal distros and add a simple option to use
MSEide and it supports debugging from what I understand.  Then maybe
extending that with remote debugging would be the next item.

MSEide is ready to work with remote gdbserver and gdbproxy. I even use it for 
C development with AVR32.


That is good to know.  Thanks Martin!
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell

On 11/17/2010 01:32 PM, Marco van de Voort wrote:

Regarding OS X, iirc I saw a mention somewhere that some components of Mac
OS X prefer decomposed characters. (aka UTF-8Mac).
In another forum I saw this mentioned as surrogate pairs. Sorry for the 
confusion :(.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Free Pascal 2.4.2 minimal distros for fpGUI available

2010-11-17 Thread Michael Schnell

On 11/17/2010 02:10 PM, Paul Breneman wrote:

Martin Schreiber wrote:

On Tuesday, 16. November 2010 14.52:12 Paul Breneman wrote:

I'd like to take the minimal distros and add a simple option to use
MSEide and it supports debugging from what I understand.  Then maybe
extending that with remote debugging would be the next item.

MSEide is ready to work with remote gdbserver and gdbproxy. I even 
use it for C development with AVR32.


That is good to know.  Thanks Martin!


Indeed !
So this should be possible with Lazarus, too. It would be great to see 
someone here who successfully tried it. :)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Jonas Maebe


On 17 Nov 2010, at 13:44, Michael Schnell wrote:

In fact I was not aware of the UTF-16 coding scheme. I _supposed_ it  
would work similar as UTF-8 (highest bit set => 32 bit value  
composed from the 31 remaining bits of this and the next word and  
bit 31 reset) and thus could be decoded algorithmically. Seemingly I  
was wrong and a huge table is needed to decode UTF16.


You don't need any table, see utf16toutf32 in 
http://svn.freepascal.org/svn/fpc/trunk/rtl/inc/wstrings.inc


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:


It's an users own choice to not be unicode compliant in his apps (e.g. if he
knows he never goes to the Eastern Asiatic market etc), but a runtime should
be as unicode compliant as reasonably possible.


IMO there exist levels of compliance.

The bottom level supplies storage facilities for Unicode. Strings are 
only stored and processed as a whole, never analyzed or modified. This 
level supports e.g. text display and storage in databases. A single 
internal Unicode representation is sufficient, e.g. UTF-16 for Windows 
or UTF-8 else.


In the next level dedicated string handling is added, for everydays use, 
like for splitting and composing filenames; this is where basic 
iteration and character classification support enters the scene, for 
mostly internal use. Separator characters can be assumed as ASCII, so 
that they can be found by a dumb byte/char scan; only few encodings have 
to be recognized and handled, based on the char size: MBCS (UTF-8...), 
WideChars (UTF-16/UCS2) and UTF-32.


Beforementioned levels IMO can be considered part of the RTL.

Next comes codepage specific handling, usable by coders which are 
familiar with the specific language. Here more basic parsing features 
are added, like for whitespace, punctuation, words, numbers etc.  Added 
are character classification and conversion (upper/lower), what requires 
an implementation of Unicode character sets, as a replacement for the 
restricted 256-element sets. The Unicode BMP eventually can be treated 
as one such codepage, so that the many Unicode separators (spaces, 
dashes...) can be handled in a unique way.


This level can be implemented in language/codepage specific packages, 
whose maintenance requires more than only coding skills.


The top level is text processing, where knowledge of the language is 
inevitable, and special libraries are required. Only at this level 
string composition and decomposition at single character level is 
required, taking into account ambiguous encodings, ligatures, and the 
other hazzles introduced by full Unicode. The grammar of a language 
becomes important, for e.g. proper spelling and breaking of words.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Free Pascal 2.4.2 minimal distros for fpGUI available

2010-11-17 Thread Paul Breneman

Michael Schnell wrote:

On 11/16/2010 02:52 PM, Paul Breneman wrote:



I try hard to write Pascal programs without using pointers and without 
allocating or deallocating memory. 


OOpps. How is this possible :) ?  But in what way does this help on that 
issue ? I do see that memory allocation/deallocation is a source to very 
hard to find bugs, but avoiding it makes may things close to impossible.


I don't doubt that there are certain domains where application-level 
memory allocation/deallocation is very necessary.  But for the type of 
programs I usually write it isn't.  I've written a lot of machine 
control projects with TurboPascal starting in 1985.  The only strings 
(short) at that time were still long enough for the serial data buffers 
I needed.


In 1985 I started working on an existing real-time video editing program 
written in PDP-11 assembler.  In the early 90s I ported that to pure 
TurboPascal (no assembler) with ~50,000 source lines.  The only 
interrupts were for the keyboard and also a video timing interrupt (50 
or 60 Hz) and about one-third of the code lived in the timing interrupt 
routine.  One system had 24 RS-422 serial ports (38,400 baud) in an 
inexpensive RadioShack Tandy 386 computer, which was the main computer 
that controlled about $2 million of digital video editing equipment. 
Quite a funny picture in a way (a peasant running a rich kingdom)...


Several fellows who developed such PC-based editors before me had to use 
expensive and complex intelligent multi-port serial port cards but 
thankfully I waited just long enough that the 16550 UART (with the 16 
byte FIFO) had arrived on the scene so my system was much simpler.  All 
of the commands (during the real-time recording) were 16 bytes or less 
so using 16550 UARTS it all worked without interrupts for the serial ports.


I just went back and searched the ~50,000 likes of source, and there is 
not a single New/Dispose (nor Mark/Release) in all of my code.  My 
program never crashed (even at the beginning beta stage) and I never 
used a debugger but rather a little logging when needed.


All video editing is now done with fast computers but here is a freeware 
version of that program that can utilize 9 serial ports:

  http://www.brenemanlabs.com/MachOne5.htm
  http://www.brenemanlabs.com/Mach1I.htm

One of my favorite authors (Jack Crenshaw) likes TurboPascal a lot, and 
(if I remember right) years ago he wrote something along the lines of: 
When I compile a C program I'm surprised if it runs the first time, but 
when I compile a TurboPascal program I'm surprised if it doesn't run the 
first time.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Boehm garbage collector for freepascal

2010-11-17 Thread Thaddy

On 17-11-2010 10:01, Michael Schnell wrote:

On 11/15/2010 02:04 PM, Thaddy wrote: ...

If you want to extend a compiler to allow for garbage collection, 
would it not be appropriate to have it manage an additional type of 
variables  that is supposed to be "garbage collection enabled" 
pointers and have it always create double indirect accesses to this 
type of variables. The "garbage collection enabled memory allocator" 
would manage a list of these variables and see that the value of the 
pointer is an entry in this list that again is a pointer to the 
instance. Now that garbage collector can do it's work without the user 
program knowing about this. t3
Well, basically, all these things have already been answered by the way 
the Boehm GC's architecture has evolved. Threading is possible (Thread 
local (sub)  allocator/collector), you can mark memory as not GC'd etc.
The Boehm GC is a very mature piece of software engineering, that's why 
f.e. it's the default for mono and the GNU static - not the bytecode - 
java compiler. Note the static: this paradigm is very close in 
implementation because this also assumes heap allocated objects etc.

In general, I have read nothing new that isn't already taken care of.
The implementation is pretty straight forward.
Reading the documentation and historical conversations is another thing 
altogether.


As it stands, my version as presented is not thread safe, though a 
simple recompile BoehmGC with treading enabled is enough. The Freepascal 
interface does not change.
There is even a mechanism in the Boehm GC that allows to recognize what 
kind of memory is requested, by registering the type of an object, not 
the object itself, to the collector.
This is extensively used in the static java compiler f.e. and thoroughly 
documented.


Also: a GC is not as reliable for for example real-time applications 
where indeed the GC can interfear, but there are calls to delay 
collection and to perform the collection immediately.
Modern measurements show - and most in the know agree - that in 
applications with many small allocations a GC may be faster, more 
performant, than classic alloc/free.
There is a performance penalty only with large allocations (at least 
under windows, I386) with blocks of 100K+ but you can always allocate 
this manually if it becomes a problem.
And in the standard Delphi memory manager these big allocations are also 
deferred to OS calls (virtual alloc family of win32 calls)


Boehm GC is not slow, on the contrary: the fact that it is conservative 
makes it faster. More specific in real applications (specially server 
applications, database handling etc) it may be faster, at the expense of 
memory resources, but that is a given for many a speed-up anyway..

Boehm GC can be compiled for thread safety very simply (1 compile switch).

To summarize: what you write/propose is already in there :)

I have rather good initial results with it, even compiling medium/large 
Freepascal projects with it, f.e. fpGUI and fpGUI designer ;-) :-)


I never proposed to make the GC mm the default, though. But it might be 
an advantage for some projects as the discussions over the years implied.
When you carefully read the historical trail you realize most of the 
arguments here are really old school and already falsified by the Boehm 
community (except for stack based object allocations that is!)


Thaddy
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel