Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-06-11 Thread Igor Stasenko
Some extra ideas.

1. Avoiding extra header for big sized objects.
I not sure about this, but still ..

according to Eliot's design:
8: slot size (255 => extra header word with large size)

What if we extend size to 16 bits (so in total it will be 65536 slots)
and we have a single flag, pointing how to calculate object size:

flag(0)   object size = (size field) * 8
flag(1)  object size = 2^ (slot field)

which means that past 2^16 (or how many bits we dedicate to size field
in header) all object sizes
will be power of two.
Since most of the objects will fit under 2^16, we don't lose much.
For big arrays, we could have a special collection/array, which will
store exact size in it's inst var (and we even don't need to care in
cases of Sets/Dicts/OrderedCollections).
Also we can actually make it transparent:

Array class>>new: size
  size > (max exact size ) ifTrue: [ ^ ArrayWithBigSizeWhatever new: size ]

of course, care must be taken for those variable classes which
potentially can hold large amounts of bytes (like Bitmap).
But i think code can be quickly adopted to this feature of VM, which
will simply fail a #new: primitive
if size is not power of two for sizes greater than max "exact size"
which can fit into size field of header.


2. Slot for arbitrary properties.
If you read carefully, Eliot said that for making lazy become it is
necessary to always have some extra space per object, even if object
don't have any fields:

<>

So, this fits quite well with idea of having slot for dynamic
properties per object. What if instead of "extending object" when it
requires extra properties slot, we just reserve the slot for
properties at the very beginning:

[ header ]
[ properties slot]
... rest of data ..

so, any object will have that slot. And in case of lazy-become. we can
use that slot for holding forwarding pointer. Voila.

3. From 2. we going straight back to hash.. VM don't needs to know
such a thing as object's hash, it has no semantic load inside VM, it
just answers those bits by a single primitive.

So, why it is kind of enforced inherent property of all objects in
system? And why nobody asks, if we have that one, why we could not
have more than one or as many as we want? This is my central question
around idea of having per-object properties.
Once VM will guarantee that any object can have at least one slot for
storing object reference (property slot),
then it is no longer needed for VM to care about identity hash.

Because it can be implemented completely at language size. But most of
all, we are NO longer limited
how big/small hash values , which directly converts into bonuses: less
hash collisions > more performance. Want 64-bit hash? 128-bit?
Whatever you desire:

Object>>identityHash
   ^ self propertiesAt: #hash ifAbsentPut: [ HashGenerator newHashValue ]

and once we could have per-object properties.. and lazy become, things
like Magma will get a HUGE benefits straightly out of the box.
Because look, lazy become, immutability - those two addressing many
problems related to OODB implementation
(i barely see other use cases, where immutability would be as useful
as in cases of OODB)..
so for me it is logical to have this last step: by adding arbitrary
properties, OODB now can store the ID there.

-- 
Best regards,
Igor Stasenko.



Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-06-03 Thread Levente Uzonyi

On Wed, 30 May 2012, Igor Stasenko wrote:



Here are couple (2) of mine, highly valuable cents :)

2^20 for classes?
might be fine (or even overkill) for smalltalk systems, but could be
quite limiting for one who would like experiment and implementing a
prototype-based frameworks,
where every object is a "class" by itself.


I think it's more important to have a fast Smalltalk VM, than one that is 
slower, but might fit for a concrete experiment which might happen sometime

and would get some performance boost from the implementation.



---
8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash
20: class index
---
what takes most of the space in object header? right - hash!
Now, since we will have lazy become i am back to my idea of having
extra & arbitrary properties
per object.

In a nutshell, the idea is to not store hash in an object header, but
instead use just a single bit 'hash present'.

When identity hash of object is requested (via corresponding primitive)
the implementation could check if 'hash present' is set,
then if it's not there , we do a 'lazy become' of existing object to same object
copied into another place, but with hash bit set, and with extra 64-bit field,
where hash value can be stored.

So, when you requesting an identity hash for object which don't have it,
the object of from:
[header][...data.. ]
copied to new memory region with new layout:
[header][hash bits][...data..]

and old object, is of course 'corpsed' to forwarding pointer to new location.


The weak point of this idea is that you might run of out memory during 
the allocation of the new object if you ask the identity hash of a larger 
object or many smaller objects at once.




Next step is going from holding just hash to having an arbitrary &
dynamic number of extra fields per object.
In same way, we use 1 extra bit, indicating that object having extra properties.
And when object don't have it, we lazy-become it from being
[header][...data.. ]
or
[header][hash bits][...data..]
to:
[header][hash bits][oop][...data..]

where 'oop' can be anything - instance of Array/Dictionary (depends
how language-side will decide to store extra properties of object)

This , for instance , would allow us to store extra properties for
special object formats like variable bytes or compiled methods, which
don't have the instance variables.

Not need to mention, how useful being able to attach extra properties
per object, without changing the object's class.
And , of course the freed 18 bits (20 - 2) in header can be allocated
for other purposes.
(Stef, how many bits you need for experiments? ;)



About immediates zoo.

Keep in mind, that the more immediates we have, the more complex implementation
tends to be.

I would just keep 2 data types:
- integers
- floats

and third, special 'arbitrary' immediate , which seen by VM as a 60-bit value.
The interpretation of this value depends on lookup in range-table,
where developer specifying the correspondence between the value
interval and class:
[min .. max] -> class

intervals, of course, cannot overlap.
Determining a class of such immediate might be slower - O(log2(n)) at
best (where n is size of range table), but from other side,
how many different kinds of immediates you can fit into 60-bit value?
Right, it is 2^60. Much more than proposed 8 isn't? :)

And this extra cost can be mitigated completely by inline cache.
- in case of regular reference, you must fetch the object's class and
then compare it with one, stored in cache.
- in case of immediate reference, you compare immediate value with min
and max stored in cache fields.
And if value is in range, you got a cache hit, and free to proceed.
So, its just 1 extra comparison comparing to 'classic' inline cache.

And, after thinking how inline cache is organized, now you can scratch
the first my paragraph related to  immediates!
We really don't need to discriminate between small integers/floats/rest!!
They could also be nothing more than just one of a range(s) defined in
our zoo of 'special' immediates!

So, at the end we will have just two kinds of references:
- zero bit == 0 -- an object pointer
- zero bit == 1 -- an immediate

Voila!.

We can have real zoo of immediates, and simple implementation to support them.
And not saying that range-table is provided by language-side, so we're
free to rearrange them at any moment.

And of course, it doesn't means that VM could not reserve some of the
ranges for own 'contracted'
immediates, like Characters, and even class reference for example.
Think about it :)



I like the idea, but I'm not sure how useful it will be in practice. I'd 
also add characters as third data type. String/Chara

Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-05-30 Thread Igor Stasenko
Here are couple (2) of mine, highly valuable cents :)

2^20 for classes?
might be fine (or even overkill) for smalltalk systems, but could be
quite limiting for one who would like experiment and implementing a
prototype-based frameworks,
where every object is a "class" by itself.

---
8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash
20: class index
---
what takes most of the space in object header? right - hash!
Now, since we will have lazy become i am back to my idea of having
extra & arbitrary properties
per object.

In a nutshell, the idea is to not store hash in an object header, but
instead use just a single bit 'hash present'.

When identity hash of object is requested (via corresponding primitive)
the implementation could check if 'hash present' is set,
then if it's not there , we do a 'lazy become' of existing object to same object
copied into another place, but with hash bit set, and with extra 64-bit field,
where hash value can be stored.

So, when you requesting an identity hash for object which don't have it,
the object of from:
[header][...data.. ]
copied to new memory region with new layout:
[header][hash bits][...data..]

and old object, is of course 'corpsed' to forwarding pointer to new location.

Next step is going from holding just hash to having an arbitrary &
dynamic number of extra fields per object.
In same way, we use 1 extra bit, indicating that object having extra properties.
And when object don't have it, we lazy-become it from being
[header][...data.. ]
or
[header][hash bits][...data..]
to:
[header][hash bits][oop][...data..]

where 'oop' can be anything - instance of Array/Dictionary (depends
how language-side will decide to store extra properties of object)

This , for instance , would allow us to store extra properties for
special object formats like variable bytes or compiled methods, which
don't have the instance variables.

Not need to mention, how useful being able to attach extra properties
per object, without changing the object's class.
And , of course the freed 18 bits (20 - 2) in header can be allocated
for other purposes.
(Stef, how many bits you need for experiments? ;)



About immediates zoo.

Keep in mind, that the more immediates we have, the more complex implementation
tends to be.

I would just keep 2 data types:
 - integers
 - floats

and third, special 'arbitrary' immediate , which seen by VM as a 60-bit value.
The interpretation of this value depends on lookup in range-table,
where developer specifying the correspondence between the value
interval and class:
[min .. max] -> class

intervals, of course, cannot overlap.
Determining a class of such immediate might be slower - O(log2(n)) at
best (where n is size of range table), but from other side,
how many different kinds of immediates you can fit into 60-bit value?
Right, it is 2^60. Much more than proposed 8 isn't? :)

And this extra cost can be mitigated completely by inline cache.
- in case of regular reference, you must fetch the object's class and
then compare it with one, stored in cache.
- in case of immediate reference, you compare immediate value with min
and max stored in cache fields.
And if value is in range, you got a cache hit, and free to proceed.
So, its just 1 extra comparison comparing to 'classic' inline cache.

And, after thinking how inline cache is organized, now you can scratch
the first my paragraph related to  immediates!
We really don't need to discriminate between small integers/floats/rest!!
They could also be nothing more than just one of a range(s) defined in
our zoo of 'special' immediates!

So, at the end we will have just two kinds of references:
 - zero bit == 0 -- an object pointer
 - zero bit == 1 -- an immediate

Voila!.

We can have real zoo of immediates, and simple implementation to support them.
And not saying that range-table is provided by language-side, so we're
free to rearrange them at any moment.

And of course, it doesn't means that VM could not reserve some of the
ranges for own 'contracted'
immediates, like Characters, and even class reference for example.
Think about it :)



Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-05-30 Thread Eliot Miranda
On Wed, May 30, 2012 at 1:53 PM, Mariano Martinez Peck <
marianop...@gmail.com> wrote:

>
>
>
> On Wed, May 30, 2012 at 10:22 PM, Eliot Miranda 
> wrote:
>
>>
>>
>>
>> On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <
>> stephane.duca...@inria.fr> wrote:
>>
>>> I would like to be sure that we can have
>>>- bit for immutable objects
>>>- bits for experimenting.
>>>
>>
>> There will be quite a few.  And one will be able to steal bits from the
>> class field if one needs fewer classes.  I'm not absolutely sure of the
>> layout yet.  But for example
>>
>> 8: slot size (255 => extra header word with large size)
>> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
>> pointer, 7 => # fixed fields is in the class)
>> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
>> indexable, compiled method, ephemerons, weak, etc)
>> 1: immutability
>> 3: GC 2 mark bits. 1 forwarded bit
>> 20: identity hash
>>
>
> and we can make it lazy, that is, we compute it not at instantiation time
> but rather the first time the primitive is call.
>
>
>
>> 20: class index
>>
>
> This would probably work for a while. I think that it would be good to let
> an "open door" so that in the future we can just add one more word for a
> class pointer.
>

Turns out that's not such a simple change.  Class indices have two
advantages.  One is that they're more compact (2^20 classes is still a lot
of classes).  The other is that they're constant, which has two main
benefits.  First, in method caches and in-line caches the class field holds
an index and hence doesn't need to be updated by the GC.  The GC no longer
has top visit send sites.  Second, because they're constants both checking
for well-known classes and instantiating well-known classes can be done
without going to the specialObjectsArray. One just uses the constant.  Now
undoing these optimizations to open a back-dorr is not trivial.  So best
accept the benefits and exploit them to a maximum.


>
>>
>> still leaves 5 bits unused.  And stealing 4 bits each from class index
>> still leaves 64k classes.  So this format is simple and provides lots of
>> unused bits.  The format field is a great idea as it combines a number of
>> orthogonal properties in very few bits.  I don't want to include odd bytes
>> in format because I think a separate field that holds odd bytes and fixed
>> fields is better use of space.  But we can gather statistics before
>> deciding.
>>
>>
>>> Stef
>>>
>>> On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
>>>
>>> > Hi guys
>>> >
>>> > Here is an important topic I would like to see discussed so that we
>>> see how we can improve and join forces.
>>> > May a mail discussion then a wiki for the summary would be good.
>>> >
>>> >
>>> > stef
>>> >
>>> >
>>> >
>>> > Begin forwarded message:
>>> >
>>> >> From: Eliot Miranda 
>>> >> Subject: Re: Plan/discussion/communication around new object format
>>> >> Date: May 27, 2012 10:49:54 PM GMT+02:00
>>> >> To: Stéphane Ducasse 
>>> >>
>>> >>
>>> >>
>>> >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <
>>> stephane.duca...@inria.fr> wrote:
>>> >> Hi eliot
>>> >>
>>> >> do you have a description of the new object format you want to
>>> introduce?
>>> >>
>>> >> The design is in the class comment of CogMemoryManager in the Cog
>>> VMMaker packages.
>>> >>
>>> >> Then what is your schedule?
>>> >>
>>> >> This is difficult. I have made a small start and should be able to
>>> spend time on it starting soon.  I want to have it finished by early next
>>> year.  But it depends on work schedules etc.
>>> >>
>>> >>
>>> >> I would like to see if we can allocate igor/esteban time before we
>>> run out of money
>>> >> to help on that important topic.
>>> >> Now the solution is unclear and I did not see any document where we
>>> can evaluate
>>> >> and plan how we can help. So do you want help on that topic? Then how
>>> can people
>>> >> contribute if any?
>>> >>
>>> >> The first thing to do is to read the design document, to see if the
>>> Pharo community thinks it is the right direction, and to review it, spot
>>> deficiencies etc.  So please get those interested to read the class comment
>>> of CogMemoryManager in the latest VMMaker.oscog.
>>> >>
>>> >> Here's the current version of it:
>>> >>
>>> >> CogMemoryManager is currently a place-holder for the design of the
>>> new Cog VM's object representation and garbage collector.  The goals for
>>> the GC are
>>> >>
>>> >> - efficient object representation a la Eliot Miranda's VisualWorks
>>> 64-bit object representation that uses a 64-bit header, eliminating direct
>>> class references so that all objects refer to their classes indirectly.
>>>  Instead the header contains a constant class index, in a field smaller
>>> than a full pointer, These class indices are used in inline and first-level
>>> method caches, hence they do not have to be updated on GC (although they do
>>> have to be traced to be able to GC classes).  Classes ar

Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-05-30 Thread Eliot Miranda
On Wed, May 30, 2012 at 1:57 PM, Stefan Marr wrote:

>
> Hi Eliot:
>
> From my experience with the RoarVM, it seems to be a rather simple
> exercise to enable the VM to support a custom 'pre-header' for objects.
> That is, a constant offset in the memory that comes from the allocator,
> and is normally ignored by the GC.
>

That's a great idea.  So for experimental use one simply throws a whole
word at the problem and forgets about the issue.  Thanks, Stefan.  That
leaves me free to focus on something fast and compact that is still
flexible.  Great!


>
> That allows me to do all kind of stuff. Of course at a cost of a word per
> object, and at the cost of recompiling the VM.
> But, that should be a reasonable price to pay for someone doing research
> on these kind of things.
>
> Sometimes a few bits are just not enough, and such a pre-header gives much
> much more flexibility.
> For the people interested in that, I could dig out the details (I think, I
> did that already once on this list).
>
> Best regards
> Stefan
>
>
> On 30 May 2012, at 22:22, Eliot Miranda wrote:
>
> >
> >
> > On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <
> stephane.duca...@inria.fr> wrote:
> > I would like to be sure that we can have
> >- bit for immutable objects
> >- bits for experimenting.
> >
> > There will be quite a few.  And one will be able to steal bits from the
> class field if one needs fewer classes.  I'm not absolutely sure of the
> layout yet.  But for example
> >
> > 8: slot size (255 => extra header word with large size)
> > 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
> pointer, 7 => # fixed fields is in the class)
> > 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
> indexable, compiled method, ephemerons, weak, etc)
> > 1: immutability
> > 3: GC 2 mark bits. 1 forwarded bit
> > 20: identity hash
> > 20: class index
> >
> > still leaves 5 bits unused.  And stealing 4 bits each from class index
> still leaves 64k classes.  So this format is simple and provides lots of
> unused bits.  The format field is a great idea as it combines a number of
> orthogonal properties in very few bits.  I don't want to include odd bytes
> in format because I think a separate field that holds odd bytes and fixed
> fields is better use of space.  But we can gather statistics before
> deciding.
> >
> >
> > Stef
> >
> > On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
> >
> > > Hi guys
> > >
> > > Here is an important topic I would like to see discussed so that we
> see how we can improve and join forces.
> > > May a mail discussion then a wiki for the summary would be good.
> > >
> > >
> > > stef
> > >
> > >
> > >
> > > Begin forwarded message:
> > >
> > >> From: Eliot Miranda 
> > >> Subject: Re: Plan/discussion/communication around new object format
> > >> Date: May 27, 2012 10:49:54 PM GMT+02:00
> > >> To: Stéphane Ducasse 
> > >>
> > >>
> > >>
> > >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <
> stephane.duca...@inria.fr> wrote:
> > >> Hi eliot
> > >>
> > >> do you have a description of the new object format you want to
> introduce?
> > >>
> > >> The design is in the class comment of CogMemoryManager in the Cog
> VMMaker packages.
> > >>
> > >> Then what is your schedule?
> > >>
> > >> This is difficult. I have made a small start and should be able to
> spend time on it starting soon.  I want to have it finished by early next
> year.  But it depends on work schedules etc.
> > >>
> > >>
> > >> I would like to see if we can allocate igor/esteban time before we
> run out of money
> > >> to help on that important topic.
> > >> Now the solution is unclear and I did not see any document where we
> can evaluate
> > >> and plan how we can help. So do you want help on that topic? Then how
> can people
> > >> contribute if any?
> > >>
> > >> The first thing to do is to read the design document, to see if the
> Pharo community thinks it is the right direction, and to review it, spot
> deficiencies etc.  So please get those interested to read the class comment
> of CogMemoryManager in the latest VMMaker.oscog.
> > >>
> > >> Here's the current version of it:
> > >>
> > >> CogMemoryManager is currently a place-holder for the design of the
> new Cog VM's object representation and garbage collector.  The goals for
> the GC are
> > >>
> > >> - efficient object representation a la Eliot Miranda's VisualWorks
> 64-bit object representation that uses a 64-bit header, eliminating direct
> class references so that all objects refer to their classes indirectly.
>  Instead the header contains a constant class index, in a field smaller
> than a full pointer, These class indices are used in inline and first-level
> method caches, hence they do not have to be updated on GC (although they do
> have to be traced to be able to GC classes).  Classes are held in a sparse
> weak table.  The class table needs only to be indexed by an instance's
> class index in class hierarchy search, in the

Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-05-30 Thread Stefan Marr
Hi Eliot:

From my experience with the RoarVM, it seems to be a rather simple exercise to 
enable the VM to support a custom 'pre-header' for objects.
That is, a constant offset in the memory that comes from the allocator, and is 
normally ignored by the GC.

That allows me to do all kind of stuff. Of course at a cost of a word per 
object, and at the cost of recompiling the VM.
But, that should be a reasonable price to pay for someone doing research on 
these kind of things.

Sometimes a few bits are just not enough, and such a pre-header gives much much 
more flexibility.
For the people interested in that, I could dig out the details (I think, I did 
that already once on this list).

Best regards
Stefan


On 30 May 2012, at 22:22, Eliot Miranda wrote:

> 
> 
> On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse 
>  wrote:
> I would like to be sure that we can have
>- bit for immutable objects
>- bits for experimenting.
> 
> There will be quite a few.  And one will be able to steal bits from the class 
> field if one needs fewer classes.  I'm not absolutely sure of the layout yet. 
>  But for example
> 
> 8: slot size (255 => extra header word with large size)
> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for 
> pointer, 7 => # fixed fields is in the class)
> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, 
> compiled method, ephemerons, weak, etc)
> 1: immutability
> 3: GC 2 mark bits. 1 forwarded bit
> 20: identity hash
> 20: class index
> 
> still leaves 5 bits unused.  And stealing 4 bits each from class index still 
> leaves 64k classes.  So this format is simple and provides lots of unused 
> bits.  The format field is a great idea as it combines a number of orthogonal 
> properties in very few bits.  I don't want to include odd bytes in format 
> because I think a separate field that holds odd bytes and fixed fields is 
> better use of space.  But we can gather statistics before deciding.
> 
> 
> Stef
> 
> On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
> 
> > Hi guys
> >
> > Here is an important topic I would like to see discussed so that we see how 
> > we can improve and join forces.
> > May a mail discussion then a wiki for the summary would be good.
> >
> >
> > stef
> >
> >
> >
> > Begin forwarded message:
> >
> >> From: Eliot Miranda 
> >> Subject: Re: Plan/discussion/communication around new object format
> >> Date: May 27, 2012 10:49:54 PM GMT+02:00
> >> To: Stéphane Ducasse 
> >>
> >>
> >>
> >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse 
> >>  wrote:
> >> Hi eliot
> >>
> >> do you have a description of the new object format you want to introduce?
> >>
> >> The design is in the class comment of CogMemoryManager in the Cog VMMaker 
> >> packages.
> >>
> >> Then what is your schedule?
> >>
> >> This is difficult. I have made a small start and should be able to spend 
> >> time on it starting soon.  I want to have it finished by early next year.  
> >> But it depends on work schedules etc.
> >>
> >>
> >> I would like to see if we can allocate igor/esteban time before we run out 
> >> of money
> >> to help on that important topic.
> >> Now the solution is unclear and I did not see any document where we can 
> >> evaluate
> >> and plan how we can help. So do you want help on that topic? Then how can 
> >> people
> >> contribute if any?
> >>
> >> The first thing to do is to read the design document, to see if the Pharo 
> >> community thinks it is the right direction, and to review it, spot 
> >> deficiencies etc.  So please get those interested to read the class 
> >> comment of CogMemoryManager in the latest VMMaker.oscog.
> >>
> >> Here's the current version of it:
> >>
> >> CogMemoryManager is currently a place-holder for the design of the new Cog 
> >> VM's object representation and garbage collector.  The goals for the GC are
> >>
> >> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit 
> >> object representation that uses a 64-bit header, eliminating direct class 
> >> references so that all objects refer to their classes indirectly.  Instead 
> >> the header contains a constant class index, in a field smaller than a full 
> >> pointer, These class indices are used in inline and first-level method 
> >> caches, hence they do not have to be updated on GC (although they do have 
> >> to be traced to be able to GC classes).  Classes are held in a sparse weak 
> >> table.  The class table needs only to be indexed by an instance's class 
> >> index in class hierarchy search, in the class primitive, and in tracing 
> >> live objects in the heap.  The additional header space is allocated to a 
> >> much expanded identity hash field, reducing hash efficiency problems in 
> >> identity collections due to the extremely small (11 bit) hash field in the 
> >> old Squeak GC.  The identity hash field is also a key element of the class 
> >> index scheme.  A class's identity hash is its index into the clas

Re: [Pharo-project] [Vm-dev] Re: Plan/discussion/communication around new object format

2012-05-30 Thread Mariano Martinez Peck
On Wed, May 30, 2012 at 10:22 PM, Eliot Miranda wrote:

>
>
>
> On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <
> stephane.duca...@inria.fr> wrote:
>
>> I would like to be sure that we can have
>>- bit for immutable objects
>>- bits for experimenting.
>>
>
> There will be quite a few.  And one will be able to steal bits from the
> class field if one needs fewer classes.  I'm not absolutely sure of the
> layout yet.  But for example
>
> 8: slot size (255 => extra header word with large size)
> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
> pointer, 7 => # fixed fields is in the class)
> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable,
> compiled method, ephemerons, weak, etc)
> 1: immutability
> 3: GC 2 mark bits. 1 forwarded bit
> 20: identity hash
>

and we can make it lazy, that is, we compute it not at instantiation time
but rather the first time the primitive is call.



> 20: class index
>

This would probably work for a while. I think that it would be good to let
an "open door" so that in the future we can just add one more word for a
class pointer.


>
> still leaves 5 bits unused.  And stealing 4 bits each from class index
> still leaves 64k classes.  So this format is simple and provides lots of
> unused bits.  The format field is a great idea as it combines a number of
> orthogonal properties in very few bits.  I don't want to include odd bytes
> in format because I think a separate field that holds odd bytes and fixed
> fields is better use of space.  But we can gather statistics before
> deciding.
>
>
>> Stef
>>
>> On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
>>
>> > Hi guys
>> >
>> > Here is an important topic I would like to see discussed so that we see
>> how we can improve and join forces.
>> > May a mail discussion then a wiki for the summary would be good.
>> >
>> >
>> > stef
>> >
>> >
>> >
>> > Begin forwarded message:
>> >
>> >> From: Eliot Miranda 
>> >> Subject: Re: Plan/discussion/communication around new object format
>> >> Date: May 27, 2012 10:49:54 PM GMT+02:00
>> >> To: Stéphane Ducasse 
>> >>
>> >>
>> >>
>> >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <
>> stephane.duca...@inria.fr> wrote:
>> >> Hi eliot
>> >>
>> >> do you have a description of the new object format you want to
>> introduce?
>> >>
>> >> The design is in the class comment of CogMemoryManager in the Cog
>> VMMaker packages.
>> >>
>> >> Then what is your schedule?
>> >>
>> >> This is difficult. I have made a small start and should be able to
>> spend time on it starting soon.  I want to have it finished by early next
>> year.  But it depends on work schedules etc.
>> >>
>> >>
>> >> I would like to see if we can allocate igor/esteban time before we run
>> out of money
>> >> to help on that important topic.
>> >> Now the solution is unclear and I did not see any document where we
>> can evaluate
>> >> and plan how we can help. So do you want help on that topic? Then how
>> can people
>> >> contribute if any?
>> >>
>> >> The first thing to do is to read the design document, to see if the
>> Pharo community thinks it is the right direction, and to review it, spot
>> deficiencies etc.  So please get those interested to read the class comment
>> of CogMemoryManager in the latest VMMaker.oscog.
>> >>
>> >> Here's the current version of it:
>> >>
>> >> CogMemoryManager is currently a place-holder for the design of the new
>> Cog VM's object representation and garbage collector.  The goals for the GC
>> are
>> >>
>> >> - efficient object representation a la Eliot Miranda's VisualWorks
>> 64-bit object representation that uses a 64-bit header, eliminating direct
>> class references so that all objects refer to their classes indirectly.
>>  Instead the header contains a constant class index, in a field smaller
>> than a full pointer, These class indices are used in inline and first-level
>> method caches, hence they do not have to be updated on GC (although they do
>> have to be traced to be able to GC classes).  Classes are held in a sparse
>> weak table.  The class table needs only to be indexed by an instance's
>> class index in class hierarchy search, in the class primitive, and in
>> tracing live objects in the heap.  The additional header space is allocated
>> to a much expanded identity hash field, reducing hash efficiency problems
>> in identity collections due to the extremely small (11 bit) hash field in
>> the old Squeak GC.  The identity hash field is also a key element of the
>> class index scheme.  A class's identity hash is its index into the class
>> table, so to create an instance of a class one merely copies its identity
>> hash into the class index field of the new instance.  This implies that
>> when classes gain their identity hash they are entered into the class table
>> and their identity hash is that of a previously unused index in the table.
>>  It also implies that there is a maximum number of classes in the tab