Yes I was wondering about the same (for athens too :)) Stef
On Aug 2, 2011, at 2:16 PM, Igor Stasenko wrote: > Does it impacts the text rendering/processing speed? > > > On 2 August 2011 12:11, Nicolas Cellier > <nicolas.cellier.aka.n...@gmail.com> wrote: >> To complete myself, the fast #collect: already exists and is named >> #mapValues: except that it modifies the RunArray in place and also >> won't coalesce... >> >> I also gain a huge factor for #collect:as: be defining this method: >> >> RunArray>>fillFrom: aCollection with: aBlock >> "Evaluate aBlock with each of aCollections's elements as the argument. >> Collect the resulting values into self. Answer self." >> | newRuns newValues lastLength lastValue | >> newRuns := (Array new: aCollection size) writeStream. >> newValues := (Array new: aCollection size) writeStream. >> lastLength := 0. >> lastValue := Object new. >> aCollection do: [:each | >> | value | >> value := aBlock value: each. >> lastValue = value >> ifTrue: [lastLength := lastLength + 1] >> ifFalse: >> [lastLength > 0 >> ifTrue: >> [newRuns nextPut: lastLength. >> newValues nextPut: lastValue]. >> lastLength := 1. >> lastValue := value]]. >> lastLength > 0 >> ifTrue: >> [newRuns nextPut: lastLength. >> newValues nextPut: lastValue]. >> self setRuns: newRuns contents setValues: newValues contents >> >> [ (Array new: 1000) collect: [:e | 4 atRandom] as: RunArray] bench. >> BEFORE: '25.1 per second.' >> AFTER: '1,080 per second.' >> >> It's worth a few lines of code. >> >> Nicolas >> >> 2011/8/2 Nicolas Cellier <nicolas.cellier.aka.n...@gmail.com>: >>> I played a bit with RunArray, and found some un-optimized features. >>> First, I don't know why RunArray is an ArrayedCollection. It cannot >>> #add: but it can #addFirst: and #addLast:. >>> It cannot #add:withOccurrences: but it can #addLast:times:. Why >>> inventing new selectors for old behaviours ? >>> These operations will cost a realloc it the last value is different, >>> so the underlying runs/values could better be an OrderedCollection if >>> these operations are used often. >>> A RunArray cannot remove at all. >>> Very weird collection species, I don't like the implementation too much. >>> >>> Then, #do: loops could be far faster. They rely on ArrayedCollection >>> which inlines do: loops with #to:do: and #at: >>> But #at: is not that fast. Scanning the runs and counting elements >>> would result in a n^2 cost. >>> Fortunately there is a cache lastIndex,lastRun,lastOffset to keep a cost n. >>> Nonetheless, all the tests cost, and the loop is suboptimal. >>> Let use see: >>> >>> version 1: >>> RunArray>>fastDo: aBlock >>> runs with: values do: [:r :v | >>> r timesRepeat: [aBlock value: v]]. >>> >>> | tmp | >>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray. >>> { >>> [ tmp do: [:e |]] bench. >>> [ tmp fastDo: [:e |]] bench. >>> } >>> #('3,220 per second.' '6,290 per second.') >>> >>> But timesRepeat: is slow, it is unoptimized by the compiler and costs >>> a message send. >>> I think we should implement BlockClosure>>repeat: and optimize that >>> call in Compiler. >>> But let's not do it, and rather inline by ourself: >>> >>> version 2: >>> runs with: values do: [:r :v | >>> 1 to: r do: [:i | aBlock value: v]]. >>> >>> | tmp | >>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray. >>> { >>> [ tmp do: [:e |]] bench. >>> [ tmp do2: [:e |]] bench. >>> } >>> #('3,070 per second.' '25,500 per second.') >>> >>> We can even inline the with:do: loop itself: >>> version 3: >>> 1 to: runs size do: [:i | >>> | r v | >>> v := values at: i. >>> r := runs at: i. >>> [( r := r - 1) >= 0] >>> whileTrue: [aBlock value: v]]. >>> >>> | tmp | >>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray. >>> { >>> [ tmp do: [:e |]] bench. >>> [ tmp do2: [:e |]] bench. >>> } >>> #('3,370 per second.' '32,200 per second.') >>> >>> Now the operation I wanted to use was reverseDo: so I implemented: >>> RunArray>>fastReverseDo: aBlock >>> | i | >>> i := runs size. >>> [i > 0] >>> whileTrue: >>> [ | r v | >>> v := values at: i. >>> r := runs at: i. >>> i := i - 1. >>> [( r := r - 1) >= 0] >>> whileTrue: [aBlock value: v]]. >>> | tmp | >>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray. >>> { >>> [ tmp reverseDo: [:e |]] bench. >>> [ tmp reverseDo2: [:e |]] bench. >>> } >>> #('83.9 per second.' '32,600 per second.') >>> >>> Ouch! The cache is missing a lot of indices, and our loop turns into a n^2 >>> cost. >>> I know, premature optimization bla bla bla, but a factor x400 is worth >>> some inlining no? >>> >>> I guess these features are never used. >>> By now RunArray is kind of private utility for Text implementation. >>> But it could / should be generic. >>> >>> I also have proposals for count: / select: / collect:. etc... >>> It would be to evaluate the block only once per group of values. >>> For example >>> RunArray>>collect: aBlock >>> "Beware, the block will be evaluated only once per group of values." >>> ^(self class runs: (runs collect: aBlock) contents values: values >>> copy) coalesce >>> But that's controversial, it would make the RunArray behave >>> differently if the block has side effects... >>> >>> | i tmp tmp2 tmp3 | >>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]). >>> i := 0. >>> tmp2 := tmp collect: [:e | i := i + 1]. >>> i := 0. >>> tmp3 := (tmp as: RunArray) collect: [:e | i := i + 1]. >>> tmp2 = tmp3 asArray >>> ==> false >>> >>> Nicolas >>> >> >> > > > > -- > Best regards, > Igor Stasenko AKA sig. >