2010/1/18 Ted Dunning <ted.dunn...@gmail.com>: > THANK YOU. Thank you you! I was about to implement my own regularized sgd linear classifier using hashed features when I first stumbled upon your patch :)
> I have been very grumpy that I couldn't get to doing this yet. > > I will coordinate closely with you. I haven't used git yet in anger so it > will be a learning experience. Don't expect me to have time, though. ( I > will try ... but expect not to find a hole ) I'm fairly new to git too but as all apache projects are mirrored on git repos I thought it would deserve to give it a try. The following helped me a lot getting started: http://learn.github.com this is interesting too: http://nvie.com/archives/323 In the mean time could you please give me a hint on how to value the probes of the binary randomizer w.r.t. the window size? What is the impact of the allPairs feature? What is the impact of the window size? Which value make sense for average english documents? Do you think the terms should be stop-words / stemmed first? Do you think it would be worth to make the Randomizer API compute hashes of NGrams (n>1) too? How does the BinaryRandomizer compares to the DenseRandomizer (performance wise and accuracy wise)? -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name