I think I wrote some pretty bad timing code early on, and it led to some 
confusion.
I was essentially comparing the time to do a simple task in both languages 
and comparing the effort involved.  But it should have been just a simple 
hash table question.
Here's updated equivalent code in both langs and timings for each:

Python:

import re
import time
from collections import defaultdict

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print('%s function took %0.3f ms' % (f.func_name, (time2-time1)*
1000.0))
        return ret
    return wrap

def get_words():
    return re.split('[ \n\r\t-.,:_";!]', open("input1.txt").read())

@timing
def todict(words):
    d = defaultdict(int)
    for w in words:
        d[w] += 1
    return d

words = get_words()
todict(words)
todict(words)
todict(words)
todict(words)

todict function took 0.071 seconds
todict function took 0.067 seconds
todict function took 0.067 seconds
todict function took 0.067 seconds

Julia:

function todict(words)
    counts=Dict{SubString{UTF8String},Int}()
    sizehint(counts, 100000)
    w=SubString("blah",1)
    for w in words
        counts[w] = get(counts,w,0)+1
    end
    return counts
end

function get_words()
    fn="input1.txt"
    s=readall(open(fn))
    spl=Set([' ','\n','\r','\t','-','.',',',':','_','"',';','!'])
    words=split(s, spl, false)
    return words
end

words = get_words()

@time c=todict(words)
@time c=todict(words)
@time c=todict(words)
@time c=todict(words)

elapsed time: 0.212277873 seconds (6076712 bytes allocated)
elapsed time: 0.146928623 seconds (2228848 bytes allocated)
elapsed time: 0.142991681 seconds (2228848 bytes allocated)
elapsed time: 0.144675704 seconds (2228848 bytes allocated)


So still 3x slower the first time and 2x slower thereafter.

Thanks,
Roman

On Tuesday, March 4, 2014 12:15:21 AM UTC-8, Roman Sinayev wrote:
>
> Why is Julia 2x slower than Python on this test?
>
> https://gist.github.com/lqdc/9342237
>

Reply via email to