On Wed, 10 Mar 2010, Randal L. Schwartz wrote:
"Randal" == Randal L Schwartz <mer...@stonehenge.com> writes:
Randal> I can envision collection types where collect: can create something
Randal> that asSet can more easily turn into a set in bulk than adding each
Randal> element individually like this.
Keys of dictionaries, but it's hackish.
In particular, turning an array of 10,000 elements into a set
can start with a particular hashing size, whereas starting with
an empty set and adding 10,000 elements will likely involve rehasing
a number of times. If 10,000 isn't troubling enough, make that
a million. :)
Lets see what we've got (in Squeak):
Smalltalk garbageCollect.
[ (1 to: 1000000) collect: #yourself as: Set ] timeToRun. "==> 2651."
Smalltalk garbageCollect.
[ (1 to: 1000000) asSet ] timeToRun. "==> 2522".
| s |
s := Set new.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 1192"
| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 472"
Note that adding numbers from 1 to n has the lowest cost for addition,
since there will be no hash collisions at all.
So try it with randomized data:
a := (1 to: 1000000) asArray shuffled.
Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==> 4143"
Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==> 3995".
| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".
| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 636".
With better hashes:
a := Set new: 1000000.
[ a size < 1000000 ] whileTrue: [
a add: SmallInteger maxVal atRandom ].
a := a asArray.
Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==> 3895"
Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==> 3777".
| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 4729 much better".
| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 774".
Conclusions:
- #collect:as: is (almost) as fast as #asSet
- better hashes can make a huge difference
- message sends cost a lot compared to bytecodes
- if you can guess the size of your collection, do it (applies for all
growing collections and collection streaming)
- only optimize if you have to in your own code
Levente
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project