On Wed, 10 Mar 2010, Randal L. Schwartz wrote:

"Randal" == Randal L Schwartz <mer...@stonehenge.com> writes:

Randal> I can envision collection types where collect: can create something
Randal> that asSet can more easily turn into a set in bulk than adding each
Randal> element individually like this.

Keys of dictionaries, but it's hackish.


In particular, turning an array of 10,000 elements into a set
can start with a particular hashing size, whereas starting with
an empty set and adding 10,000 elements will likely involve rehasing
a number of times.  If 10,000 isn't troubling enough, make that
a million. :)


Lets see what we've got (in Squeak):

Smalltalk garbageCollect.
[ (1 to: 1000000) collect: #yourself as: Set ] timeToRun. "==> 2651."

Smalltalk garbageCollect.
[ (1 to: 1000000) asSet ] timeToRun. "==>  2522".

| s |
s := Set new.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 1192"

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 472"

Note that adding numbers from 1 to n has the lowest cost for addition, since there will be no hash collisions at all.


So try it with randomized data:

a := (1 to: 1000000) asArray shuffled.

Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==> 4143"

Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==> 3995".

| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 636".


With better hashes:

a := Set new: 1000000.
[ a size < 1000000 ] whileTrue: [
        a add: SmallInteger maxVal atRandom ].
a := a asArray.

Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==>  3895"

Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==>  3777".

| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 4729 much better".

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 774".


Conclusions:
- #collect:as: is (almost) as fast as #asSet
- better hashes can make a huge difference
- message sends cost a lot compared to bytecodes
- if you can guess the size of your collection, do it (applies for all growing collections and collection streaming)
- only optimize if you have to in your own code


Levente

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to