On 20 November 2014 10:05, Greg Lee <egreg...@gmail.com> wrote: > > Is there a faster way to do the following, which builds a dictionary of > unique tokens and counts?
I share your frustration regarding this. It should be mentioned though that converting tokens to integers is a fairly standard performance hack in Natural Language Processing, even for C/C++ code. I did exactly this for the syntacto-semantic parser I mentioned during my talk at JuliaCon and at least in Julia it is fairly easy to implement nice types that does the token to id, vice versa, mapping: https://github.com/ninjin/allen/blob/master/src/structs.jl I also agree that we should improve Julia when it comes to the performance of strings and dictionaries, but for now I am waiting for the upcoming major string code overhaul. Pontus