[Pharo-project] Optimizing RunArray

Nicolas Cellier Tue, 02 Aug 2011 01:18:14 -0700

I played a bit with RunArray, and found some un-optimized features.
First, I don't know why RunArray is an ArrayedCollection. It cannot
#add: but it can #addFirst: and #addLast:.
It cannot #add:withOccurrences: but it can #addLast:times:. Why
inventing new selectors for old behaviours ?
These operations will cost a realloc it the last value is different,
so the underlying runs/values could better be an OrderedCollection if
these operations are used often.
A RunArray cannot remove at all.
Very weird collection species, I don't like the implementation too much.


Then, #do: loops could be far faster. They rely on ArrayedCollection
which inlines do: loops with #to:do: and #at:
But #at: is not that fast. Scanning the runs and counting elements
would result in a n^2 cost.
Fortunately there is a cache lastIndex,lastRun,lastOffset to keep a cost n.
Nonetheless, all the tests cost, and the loop is suboptimal.
Let use see:

version 1:
RunArray>>fastDo: aBlock
        runs with: values do: [:r :v |
                r timesRepeat: [aBlock value: v]].

| tmp |
tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
{
[ tmp do: [:e |]] bench.
[ tmp fastDo: [:e |]] bench.
}
#('3,220 per second.' '6,290 per second.')

But timesRepeat: is slow, it is unoptimized by the compiler and costs
a message send.
I think we should implement BlockClosure>>repeat: and optimize that
call in Compiler.
But let's not do it, and rather inline by ourself:

version 2:
        runs with: values do: [:r :v |
                1 to: r do: [:i | aBlock value: v]].

| tmp |
tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
{
[ tmp do: [:e |]] bench.
[ tmp do2: [:e |]] bench.
}
#('3,070 per second.' '25,500 per second.')

We can even inline the with:do: loop itself:
version 3:
        1 to: runs size do: [:i |
                | r v |
                v := values at: i.
                r := runs at: i.
                [( r := r - 1) >= 0]
                        whileTrue: [aBlock value: v]].

| tmp |
tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
{
[ tmp do: [:e |]] bench.
[ tmp do2: [:e |]] bench.
}
#('3,370 per second.' '32,200 per second.')

Now the operation I wanted to use was reverseDo: so I implemented:
RunArray>>fastReverseDo: aBlock
        | i |
        i := runs size.
        [i > 0]
                whileTrue:
                        [ | r v |
                        v := values at: i.
                        r := runs at: i.
                        i := i - 1.
                        [( r := r - 1) >= 0]
                                whileTrue: [aBlock value: v]].
| tmp |
tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
{
[ tmp reverseDo: [:e |]] bench.
[ tmp reverseDo2: [:e |]] bench.
}
 #('83.9 per second.' '32,600 per second.')

Ouch! The cache is missing a lot of indices, and our loop turns into a n^2 cost.
I know, premature optimization bla bla bla, but a factor x400 is worth
some inlining no?

I guess these features are never used.
By now RunArray is kind of private utility for Text implementation.
But it could / should be generic.

I also have proposals for count: / select: / collect:. etc...
It would be to evaluate the block only once per group of values.
For example
RunArray>>collect: aBlock
        "Beware, the block will be evaluated only once per group of values."
        ^(self class runs: (runs collect: aBlock) contents values: values
copy) coalesce
But that's controversial, it would make the RunArray behave
differently if the block has side effects...

| i tmp tmp2 tmp3 |
tmp := ((Array new: 1000) collect: [:e | 4 atRandom]).
i := 0.
tmp2 := tmp collect: [:e | i := i + 1].
i := 0.
tmp3 := (tmp as: RunArray) collect: [:e | i := i + 1].
tmp2 = tmp3 asArray
==> false

Nicolas

[Pharo-project] Optimizing RunArray

Reply via email to