Re: Low precision in MAHOUT-228 tests (online logistic regression)

Olivier Grisel Mon, 18 Jan 2010 15:21:32 -0800

2010/1/18 Ted Dunning <ted.dunn...@gmail.com>:
> THANK YOU.

Thank you you! I was about to  implement my own regularized sgd linear
classifier using hashed features when I first stumbled upon your patch
:)


> I have been very grumpy that I couldn't get to doing this yet.
>
> I will coordinate closely with you.  I haven't used git yet in anger so it
> will be a learning experience.  Don't expect me to have time, though.  ( I
> will try ... but expect not to find a hole )

I'm fairly new to git too but as all apache projects are mirrored on
git repos I thought it would deserve to give it a try. The following
helped me a lot getting started: http://learn.github.com
this is interesting too: http://nvie.com/archives/323

In the mean time could you please give me a hint on how to value the
probes of the binary randomizer w.r.t. the window size?
What is the impact of the allPairs feature?
What is the impact of the window size? Which value make sense for
average english documents?
Do you think the terms should be stop-words / stemmed first?
Do you think it would be worth to make the Randomizer API compute
hashes of NGrams (n>1) too?
How does the BinaryRandomizer compares to the DenseRandomizer
(performance wise and accuracy wise)?

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Re: Low precision in MAHOUT-228 tests (online logistic regression)

Reply via email to