Hmm, code that depends on 2 #= Strings to be not #== ?
That would either be a very special case but more likely a bug.
In any case, it won't be very common I guess.

The mutability/immutability is a way more important issue, unfixable unless we 
introduce a new class IMHO.

But saving 2MB is impressive.

On 30 May 2014, at 10:59, Clément Bera <bera.clem...@gmail.com> wrote:

> Hello,
> 
> I like the idea but this is not as simple.
> 
> In some framework you may use different string with a same name as markers 
> that are not equals.
> 
> Typically:
> 
> Object>>#string1
>     ^ 'string'
> 
> Object>>#string2
>     ^ 'string'
> 
> Object>>#test
>     self assert: self string1 == self string1. "Answers true"
>     self assert: self string2 == self string2. "Answers true"
>     self assert: self string1 == self string2 "Answers false"
> 
> Frameworks relying on that will not work any more.
> 
> And this kind of bugs is not easy to spot, it typically crashes identity 
> collections in a non deterministic fashion.
> 
> Regards
> 
> 
> 2014-05-30 9:39 GMT+02:00 Philippe Marschall 
> <philippe.marsch...@netcetera.ch>:
> Hi
> 
> This is an idea I stole from somebody else. The assumption is that you have a 
> lot of Strings in the image that are equal. We could therefore remove the 
> duplicates and make all the objects refer to the same instance.
> 
> However it's not a simple as that. The main issue is that String has two 
> responsibilities. The first is as an immutable value object. The second is as 
> a mutable character buffer for building immutable value objects. We must not 
> deduplicate the second kind. Unfortunately it's not straight forward to 
> figure out which kind a string is. The approach I took is looking at whether 
> it contains any 0 characters. An other option would be to check whether any 
> WirteStreams are referring to it.
> Also, since there are behavioral differences between String and Symbol 
> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in the 
> heap and they compare #= true but we must not make anybody who refers to 
> 'hello' suddenly refer to #'hello').
> 
> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3 
> image. Sorry for the bad variable names.
> 
> | b d m |
> b := Bag new.
> d := OrderedCollection new.
> m := Dictionary new.
> "count all string instances"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         b add: s ] ].
> "find the ones that have no duplicates or are likely buffers"
> b doWithOccurrences: [ :s :i |
>     (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
>         d add: s -> i ] ].
> "remove the ones that have no duplicates or are likely buffers"
> d do: [ :a |
>     a value timesRepeat: [
>         b remove: a key ]  ].
> "map all duplicate strings to their duplicates"
> String allSubInstancesDo: [ :s |
>     s isSymbol ifFalse: [
>         (b includes: s) ifTrue: [
>             | l |
>             l := m at: s ifAbsentPut: [ OrderedCollection new ].
>             l add: s  ] ].
> "remove the duplicates"
> m keysAndValues do [ :k :v |
>     | f |
>     f := v at: 1.
>     2 to: v size do: [ :i |
>         (v at: i) becomeForward: f ] ]
> 
> Cheers
> Philippe
> 
> 
> 


Reply via email to