On 20 November 2014 10:05, Greg Lee <egreg...@gmail.com> wrote:
>
> Is there a faster way to do the following, which builds a dictionary of
> unique tokens and counts?

I share your frustration regarding this.  It should be mentioned
though that converting tokens to integers is a fairly standard
performance hack in Natural Language Processing, even for C/C++ code.
I did exactly this for the syntacto-semantic parser I mentioned during
my talk at JuliaCon and at least in Julia it is fairly easy to
implement nice types that does the token to id, vice versa, mapping:

    https://github.com/ninjin/allen/blob/master/src/structs.jl

I also agree that we should improve Julia when it comes to the
performance of strings and dictionaries, but for now I am waiting for
the upcoming major string code overhaul.

    Pontus

Reply via email to