I'm curious to see which method is really using mutable strings.
Of course, while constructing and write streaming on it, but then it's like
a temporary storage area and we don't have to care.
We're speaking of a place where a String would be modified to retain some
state...
Of course, since we can't exclude this possibility theoretically, the
proposed changed is unsafe.
But practically...


2014-05-30 13:46 GMT+02:00 Marcus Denker <marcus.den...@inria.fr>:

>
> On 30 May 2014, at 10:59, Clément Bera <bera.clem...@gmail.com> wrote:
>
> Hello,
>
> I like the idea but this is not as simple.
>
> In some framework you may use different string with a same name as markers
> that are not equals.
>
> Typically:
>
> Object>>#string1
>     ^ 'string'
>
> Object>>#string2
>     ^ 'string'
>
> Object>>#test
>     self assert: self string1 == self string1. "Answers true"
>     self assert: self string2 == self string2. "Answers true"
>     self assert: self string1 == self string2 "Answers false"
>
> Frameworks relying on that will not work any more.
>
> And this kind of bugs is not easy to spot, it typically crashes identity
> collections in a non deterministic fashion.
>
>
> With an indirection (a kind of reference) that
>
> -> points to the string
> -> forwards everything, but does a copy on write on state change
> -> implements == to return false
>
>
> it would work. Of course you have then the same amount of objects(+1), but
> they would be all very
> small, thus leading to saving for large objects and especially when
> applied to subgraphs.
>
> Marcus
>
>
> Regards
>
>
> 2014-05-30 9:39 GMT+02:00 Philippe Marschall <
> philippe.marsch...@netcetera.ch>:
>
>> Hi
>>
>> This is an idea I stole from somebody else. The assumption is that you
>> have a lot of Strings in the image that are equal. We could therefore
>> remove the duplicates and make all the objects refer to the same instance.
>>
>> However it's not a simple as that. The main issue is that String has two
>> responsibilities. The first is as an immutable value object. The second is
>> as a mutable character buffer for building immutable value objects. We must
>> not deduplicate the second kind. Unfortunately it's not straight forward to
>> figure out which kind a string is. The approach I took is looking at
>> whether it contains any 0 characters. An other option would be to check
>> whether any WirteStreams are referring to it.
>> Also, since there are behavioral differences between String and Symbol
>> besides #= we must exclude Symbols (eg. there is #'hello' and 'hello' in
>> the heap and they compare #= true but we must not make anybody who refers
>> to 'hello' suddenly refer to #'hello').
>>
>> Anyway here's the code, this saves about 2 MB in a fairly stock Pharo 3
>> image. Sorry for the bad variable names.
>>
>> | b d m |
>> b := Bag new.
>> d := OrderedCollection new.
>> m := Dictionary new.
>> "count all string instances"
>> String allSubInstancesDo: [ :s |
>>     s isSymbol ifFalse: [
>>         b add: s ] ].
>> "find the ones that have no duplicates or are likely buffers"
>> b doWithOccurrences: [ :s :i |
>>     (i = 1 or: [ s anySatisfy: [ :c | c codePoint = 0 ] ]) ifTrue: [
>>         d add: s -> i ] ].
>> "remove the ones that have no duplicates or are likely buffers"
>> d do: [ :a |
>>     a value timesRepeat: [
>>         b remove: a key ]  ].
>> "map all duplicate strings to their duplicates"
>> String allSubInstancesDo: [ :s |
>>     s isSymbol ifFalse: [
>>         (b includes: s) ifTrue: [
>>             | l |
>>             l := m at: s ifAbsentPut: [ OrderedCollection new ].
>>             l add: s  ] ].
>> "remove the duplicates"
>> m keysAndValues do [ :k :v |
>>     | f |
>>     f := v at: 1.
>>     2 to: v size do: [ :i |
>>         (v at: i) becomeForward: f ] ]
>>
>> Cheers
>> Philippe
>>
>>
>>
>
>

Reply via email to