Re: Classifier Interface

Ted Dunning Fri, 20 May 2011 14:40:39 -0700

The argument for the no-link versions is that it allows partial evaluation
of models which in turn allows
substantial amounts of computation to be done off-line.  This makes >10x
difference in speed in some
cases.  See chapters 16 and 17 of MiA.  The contract here should be that if
your classifier can be factored
into a linear combination of sub-models, said linear combination that is
normally transformed, *NoLink
should return that value.


Yes, it should be NoInverseLink, but I got tired fo typing.

The argument for the *Scalar forms is that in the binomial case we can avoid
allocation or copying of a result
entirely.  This makes a noticeable difference in performance in a few cases.

The argument for the *Full forms is that the partial forms are cheaper to
compute for many classifiers.

On Fri, May 20, 2011 at 1:04 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Probably better ClassifierResult.classify(ClassifyResult holder), for
> the memory sake..
>
> I like the idea a lot. Except multinomial case may encounter a lot of
> unnecessary copying really so again, tragically critical for iteration
> time here.
>
> On Fri, May 20, 2011 at 1:01 PM, Grant Ingersoll <[email protected]>
> wrote:
> > Perhaps we should future proof here a little bit and simply have a
> classify method that returns a typed object that contains the necessary info
> depending on the implementation?  Something like:
> > ClassifierResult classify()
> >
> > and then ClassifierResult has an enum or something that indicates whether
> one should grab the Vector, Matrix or double.   Just brainstorming...
> >
> >
> > On May 20, 2011, at 2:21 PM, Hector Yee wrote:
> >
> >> Hi,
> >>
> >>  I noticed that classifier has three functions to call to get the score.
> >> classify - returns probabilities
> >> classifyNoLink - returns the raw score (optional)
> >> classifyScalar - returns the binary probability
> >>
> >> I'm working on a few classifiers for which it doesn't make sense to
> return
> >> probability. In fact, the probability is just the raw score
> exponentiated.
> >> This would distort the scores a bit, rather than if the user just used
> the
> >> raw score directly. Also, if they assume that the scores are really
> >> probabilities they may be tempted to use it to compare between two
> >> classifiers without previously calibrating on a test set.
> >>
> >> I wonder if we can add classifiyScalarNoLink and make the NoLinks
> >> non-optional. They just return probabilities if you're using a
> classifier
> >> that returns in the 0-1 range.
> >> This way people  can choose to use either interface primarily, rather
> than
> >> calling classify and assume all classifiers support probabilities.
> >>
> >> Finally, there's some algorithms that can return regression / ranking or
> >> classification scores depending on the training data. I was just
> planning to
> >> return the same value via classifiyScalarNoLink but it seems to be a
> poorly
> >> named proposed function. I could just name the function 'score' but it
> would
> >> break the naming convention already set down.
> >>
> >> Thoughts?
> >>
> >> --
> >> Yee Yang Li Hector
> >> http://hectorgon.blogspot.com/ (tech + travel)
> >> http://hectorgon.com (book reviews)
> >
> >
> >
>

Re: Classifier Interface

Reply via email to