Ah, thanks for that question.  I was testing from IJulia.  Those tests were 
not showing a boost from Steven Johnson's hash function, perhaps because of 
the sequence in which I executed the code.

Rerunning as a script from command line using the Base.hash trick + 
SubArrays yields run times about the same as using Symbol and also lower 
memory usage, as you noted.

Just reran Python vs Julia tests. The times I get are: 
     Julia       0.46 seconds       run from command line as 'julia 
word_count.jl'
     IPython   0.38 seconds      run from command line as 'ipython 
word_count.ipy'

So the python ran about 20% faster.
I'm running Python 2.7.6 64-bit  and Julia 0.3.0-prerelease master/4e48d5b*.

The Julia code is posted to https://gist.github.com/catawbasam/9364944.

The ipython code I ran was :

import re
from collections import Counter
 
fn = "/tmp/juliaV2ydrK" 
%time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read()))
%time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read()))
%time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read()))
%time c = Counter(re.split('[ \n\r\t-.,:_";!]', open(fn).read()))


On Tuesday, March 4, 2014 11:34:05 PM UTC-5, Roman Sinayev wrote:
>
> Nice. Mine actually takes 30% more memory now (from how I understand 
> Steven's comment mostly because we're making a copy of the Symbol) , but 
> time is ~5% faster. Still about 0.55 though.
> Did you run the function several times in the REPL? I am getting these 
> numbers when running the script from a command line.
>
> Also wouldn't the result be a dictionary of symbols that I would then have 
> to convert to strings again for further analysis?
>
>
>>
>>

Reply via email to