On 30 May 2014, at 09:39, Philippe Marschall <philippe.marsch...@netcetera.ch> 
wrote:

> Hi
> 
> This is an idea I stole from somebody else. The assumption is that you have a 
> lot of Strings in the image that are equal. We could therefore remove the 
> duplicates and make all the objects refer to the same instance.
> 
> However it's not a simple as that. The main issue is that String has two 
> responsibilities. The first is as an immutable value object. The second is as 
> a mutable character buffer for building immutable value objects. We must not 
> deduplicate the second kind. Unfortunately it's not straight forward to 
> figure out which kind a string is. The approach I took is looking at whether 
> it contains any 0 characters. An other option would be to check whether any 
> WirteStreams are referring to it.

One idea could be to have an explicit immutable string literal class (or to set 
the immutability for literals in the compiler when we have support for it for 
literals).
These then would be save to de-duplicate, would be interesting to now how large 
the percentage is of literals among all your de-douplicated strings.

When playing with first class references (that can in addition override 
behavior) we had the idea that one could use that for realising a general “copy 
on write” for any kind of object and even
whole object graphs… the idea would be to  search for object graphs that are 
the same and replace the all pointers with just pointer-objects that trap all 
writes to one singe
copy. 

In a second step one could combine that with “crystallising” unused subgraphs 
(that is, serialise and compress them in memory, with references to the graph 
on-demand and transparently 
decompress). Combined, this would save a lot of space.

But the devil is in the detail: how to find unused and equal subgraphs 
efficiently.

> Also, since there are behavioral differences between String and Symbol 
> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in the 
> heap and they compare #= true but we must not make anybody who refers to 
> 'hello' suddenly refer to #'hello').
> 
> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3 
> image. Sorry for the bad variable names.
> 
Nice!

        Marcus

Reply via email to