This is probably not responsible for much of the performance difference, but the racket code has (random 23) where it looks like it ought to be (random 26), resulting in a smaller space of possible random keys. This will cause the branch with the lookup to happen somewhat more often than in the python version.
On Fri, Jul 25, 2014 at 1:34 AM, Laurent <laurent.ors...@gmail.com> wrote: > Even more idiomatic: > (time (for ([w words]) > (hash-update! d w add1 0))) > > But it seems to be slightly slower that Robby's version. > > Laurent > > > On Thu, Jul 24, 2014 at 11:26 AM, Robby Findler < > ro...@eecs.northwestern.edu> wrote: > >> Not an answer to your direction question, but this is the more >> idiomatic way to write that and it seems to be a bit faster: >> >> (time (for ([w (in-list words)]) >> (hash-set! d w (add1 (hash-ref d w 0))))) >> >> On Wed, Jul 23, 2014 at 10:54 AM, Pedro Ramos >> <pedropra...@tecnico.ulisboa.pt> wrote: >> > Hi, >> > >> > I've been developing an implementation of Python in Racket, where I'm >> > implementing Python's dictionaries over Racket custom hash tables. >> > >> > While my occasional benchmarks typically show better performance on >> Racket >> > programs than their Python equivalents, Racket's hash tables generally >> seem >> > to be slower than Python's dicts. >> > >> > I've set up this benchmark in Racket: >> > >> > >> > #lang racket >> > >> > (define alphabet "abcdefghijklmnopqrstuvwxyz") >> > (define (random-word n) >> > (build-string n (lambda (x) (string-ref alphabet (random 23))))) >> > >> > (define words (for/list ([k 1000000]) >> > (random-word 3))) >> > (define d (make-hash)) >> > >> > (time (for ([w words]) >> > (if (hash-has-key? d w) >> > (hash-set! d w (add1 (hash-ref d w))) >> > (hash-set! d w 1)))) >> > >> > >> > And its equivalent in Python: >> > >> > >> > import random >> > import time >> > >> > alphabet = "abcdefghijklmnopqrstuvwxyz" >> > def random_word(n): >> > return ''.join([random.choice(alphabet) for i in range(n)]) >> > >> > words = [random_word(3) for k in xrange(1000000)] >> > d = {} >> > >> > a = time.time() >> > for w in words: >> > if w in d: >> > d[w] = d[w] + 1 >> > else: >> > d[w] = 1 >> > b = time.time() >> > print b-a, 'seconds' >> > >> > >> > The Racket example yields running times of around 500 ms (running on >> Racket >> > v6.0.1) while the Python example yields running times of around 330 ms >> > (running on Python 2.7.3). >> > >> > I find this unusual because Python is somewhat more dynamic than Racket, >> > since >> > (a) Python's equality and hashing functions have to dispatched at >> runtime >> > for each key; >> > (b) referencing and setting values in a Python dict is done using a very >> > general operator, [], whose behaviour also has to be dispatched at >> runtime, >> > unlike the more specific hash-ref and hash-set! Racket functions. >> > >> > Is there something I'm missing about Racket's hash tables which explains >> > this slower speed? >> > >> > Thanks, >> > Pedro Ramos >> > >> > _________________________ >> > Racket Developers list: >> > http://lists.racket-lang.org/dev >> > >> _________________________ >> Racket Developers list: >> http://lists.racket-lang.org/dev >> > > > _________________________ > Racket Developers list: > http://lists.racket-lang.org/dev > > -- A warb degombs the brangy. Your gitch zanks and leils the warb.
_________________________ Racket Developers list: http://lists.racket-lang.org/dev