2010/1/18 Ted Dunning <ted.dunn...@gmail.com>:
> THANK YOU.

Thank you you! I was about to  implement my own regularized sgd linear
classifier using hashed features when I first stumbled upon your patch
:)

> I have been very grumpy that I couldn't get to doing this yet.
>
> I will coordinate closely with you.  I haven't used git yet in anger so it
> will be a learning experience.  Don't expect me to have time, though.  ( I
> will try ... but expect not to find a hole )

I'm fairly new to git too but as all apache projects are mirrored on
git repos I thought it would deserve to give it a try. The following
helped me a lot getting started: http://learn.github.com
this is interesting too: http://nvie.com/archives/323

In the mean time could you please give me a hint on how to value the
probes of the binary randomizer w.r.t. the window size?
What is the impact of the allPairs feature?
What is the impact of the window size? Which value make sense for
average english documents?
Do you think the terms should be stop-words / stemmed first?
Do you think it would be worth to make the Randomizer API compute
hashes of NGrams (n>1) too?
How does the BinaryRandomizer compares to the DenseRandomizer
(performance wise and accuracy wise)?

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Reply via email to