Ah, thanks for that question. I was testing from IJulia. Those tests were not showing a boost from Steven Johnson's hash function, perhaps because of the sequence in which I executed the code.
Rerunning as a script from command line using the Base.hash trick + SubArrays yields run times about the same as using Symbol and also lower memory usage, as you noted. Just reran Python vs Julia tests. The times I get are: Julia 0.46 seconds run from command line as 'julia word_count.jl' IPython 0.38 seconds run from command line as 'ipython word_count.ipy' So the python ran about 20% faster. I'm running Python 2.7.6 64-bit and Julia 0.3.0-prerelease master/4e48d5b*. The Julia code is posted to https://gist.github.com/catawbasam/9364944. The ipython code I ran was : import re from collections import Counter fn = "/tmp/juliaV2ydrK" %time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read())) %time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read())) %time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read())) %time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read())) On Tuesday, March 4, 2014 11:34:05 PM UTC-5, Roman Sinayev wrote: > > Nice. Mine actually takes 30% more memory now (from how I understand > Steven's comment mostly because we're making a copy of the Symbol) , but > time is ~5% faster. Still about 0.55 though. > Did you run the function several times in the REPL? I am getting these > numbers when running the script from a command line. > > Also wouldn't the result be a dictionary of symbols that I would then have > to convert to strings again for further analysis? > > >> >>