Hello,

I like the idea but this is not as simple.

In some framework you may use different string with a same name as markers
that are not equals.

Typically:

Object>>#string1
    ^ 'string'

Object>>#string2
    ^ 'string'

Object>>#test
    self assert: self string1 == self string1. "Answers true"
    self assert: self string2 == self string2. "Answers true"
    self assert: self string1 == self string2 "Answers false"

Frameworks relying on that will not work any more.

And this kind of bugs is not easy to spot, it typically crashes identity
collections in a non deterministic fashion.

Regards


2014-05-30 9:39 GMT+02:00 Philippe Marschall <
philippe.marsch...@netcetera.ch>:

> Hi
>
> This is an idea I stole from somebody else. The assumption is that you
> have a lot of Strings in the image that are equal. We could therefore
> remove the duplicates and make all the objects refer to the same instance.
>
> However it's not a simple as that. The main issue is that String has two
> responsibilities. The first is as an immutable value object. The second is
> as a mutable character buffer for building immutable value objects. We must
> not deduplicate the second kind. Unfortunately it's not straight forward to
> figure out which kind a string is. The approach I took is looking at
> whether it contains any 0 characters. An other option would be to check
> whether any WirteStreams are referring to it.
> Also, since there are behavioral differences between String and Symbol
> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in
> the heap and they compare #= true but we must not make anybody who refers
> to 'hello' suddenly refer to #'hello').
>
> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3
> image. Sorry for the bad variable names.
>
> | b d m |
> b := Bag new.
> d := OrderedCollection new.
> m := Dictionary new.
> "count all string instances"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         b add: s ] ].
> "find the ones that have no duplicates or are likely buffers"
> b doWithOccurrences: [ :s :i |
>     (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
>         d add: s -> i ] ].
> "remove the ones that have no duplicates or are likely buffers"
> d do: [ :a |
>     a value timesRepeat: [
>         b remove: a key ]  ].
> "map all duplicate strings to their duplicates"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         (b includes: s) ifTrue: [
>             | l |
>             l := m at: s ifAbsentPut: [ OrderedCollection new ].
>             l add: s  ] ].
> "remove the duplicates"
> m keysAndValues do [ :k :v |
>     | f |
>     f := v at: 1.
>     2 to: v size do: [ :i |
>         (v at: i) becomeForward: f ] ]
>
> Cheers
> Philippe
>
>
>

Reply via email to