I hope you're only considering this one-time for your Pharo release
image, and not something that will _continue_ to operating on an
on-going basis.  Attempting to do that below app-layer would be way
too intrusive for, for example, a Magma application..    A String read
from one DB session belongs to that session.  If it were canonicalized
and suddenly belonged to two sessions, then mutating it would end up
committing that change in both sessions.  Bad.

Strings aren't the only type to share.  What about Dates?  And
literal-Array's?  Timezones and Durations?  This is an
app-responsibility, not a system one.  A one-time compress prior to
release is okay, but not automatically behind the scenes..


On Fri, May 30, 2014 at 2:39 AM, Philippe Marschall
<philippe.marsch...@netcetera.ch> wrote:
> Hi
>
> This is an idea I stole from somebody else. The assumption is that you have
> a lot of Strings in the image that are equal. We could therefore remove the
> duplicates and make all the objects refer to the same instance.
>
> However it's not a simple as that. The main issue is that String has two
> responsibilities. The first is as an immutable value object. The second is
> as a mutable character buffer for building immutable value objects. We must
> not deduplicate the second kind. Unfortunately it's not straight forward to
> figure out which kind a string is. The approach I took is looking at whether
> it contains any 0 characters. An other option would be to check whether any
> WirteStreams are referring to it.
> Also, since there are behavioral differences between String and Symbol
> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in the
> heap and they compare #= true but we must not make anybody who refers to
> 'hello' suddenly refer to #'hello').
>
> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3
> image. Sorry for the bad variable names.
>
> | b d m |
> b := Bag new.
> d := OrderedCollection new.
> m := Dictionary new.
> "count all string instances"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         b add: s ] ].
> "find the ones that have no duplicates or are likely buffers"
> b doWithOccurrences: [ :s :i |
>     (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
>         d add: s -> i ] ].
> "remove the ones that have no duplicates or are likely buffers"
> d do: [ :a |
>     a value timesRepeat: [
>         b remove: a key ]  ].
> "map all duplicate strings to their duplicates"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         (b includes: s) ifTrue: [
>             | l |
>             l := m at: s ifAbsentPut: [ OrderedCollection new ].
>             l add: s  ] ].
> "remove the duplicates"
> m keysAndValues do [ :k :v |
>     | f |
>     f := v at: 1.
>     2 to: v size do: [ :i |
>         (v at: i) becomeForward: f ] ]
>
> Cheers
> Philippe
>
>

Reply via email to