I think I wrote some pretty bad timing code early on, and it led to some confusion. I was essentially comparing the time to do a simple task in both languages and comparing the effort involved. But it should have been just a simple hash table question. Here's updated equivalent code in both langs and timings for each:
Python: import re import time from collections import defaultdict def timing(f): def wrap(*args): time1 = time.time() ret = f(*args) time2 = time.time() print('%s function took %0.3f ms' % (f.func_name, (time2-time1)* 1000.0)) return ret return wrap def get_words(): return re.split('[ \n\r\t-.,:_";!]', open("input1.txt").read()) @timing def todict(words): d = defaultdict(int) for w in words: d[w] += 1 return d words = get_words() todict(words) todict(words) todict(words) todict(words) todict function took 0.071 seconds todict function took 0.067 seconds todict function took 0.067 seconds todict function took 0.067 seconds Julia: function todict(words) counts=Dict{SubString{UTF8String},Int}() sizehint(counts, 100000) w=SubString("blah",1) for w in words counts[w] = get(counts,w,0)+1 end return counts end function get_words() fn="input1.txt" s=readall(open(fn)) spl=Set([' ','\n','\r','\t','-','.',',',':','_','"',';','!']) words=split(s, spl, false) return words end words = get_words() @time c=todict(words) @time c=todict(words) @time c=todict(words) @time c=todict(words) elapsed time: 0.212277873 seconds (6076712 bytes allocated) elapsed time: 0.146928623 seconds (2228848 bytes allocated) elapsed time: 0.142991681 seconds (2228848 bytes allocated) elapsed time: 0.144675704 seconds (2228848 bytes allocated) So still 3x slower the first time and 2x slower thereafter. Thanks, Roman On Tuesday, March 4, 2014 12:15:21 AM UTC-8, Roman Sinayev wrote: > > Why is Julia 2x slower than Python on this test? > > https://gist.github.com/lqdc/9342237 >