The argument for the no-link versions is that it allows partial evaluation of models which in turn allows substantial amounts of computation to be done off-line. This makes >10x difference in speed in some cases. See chapters 16 and 17 of MiA. The contract here should be that if your classifier can be factored into a linear combination of sub-models, said linear combination that is normally transformed, *NoLink should return that value.
Yes, it should be NoInverseLink, but I got tired fo typing. The argument for the *Scalar forms is that in the binomial case we can avoid allocation or copying of a result entirely. This makes a noticeable difference in performance in a few cases. The argument for the *Full forms is that the partial forms are cheaper to compute for many classifiers. On Fri, May 20, 2011 at 1:04 PM, Dmitriy Lyubimov <[email protected]> wrote: > Probably better ClassifierResult.classify(ClassifyResult holder), for > the memory sake.. > > I like the idea a lot. Except multinomial case may encounter a lot of > unnecessary copying really so again, tragically critical for iteration > time here. > > On Fri, May 20, 2011 at 1:01 PM, Grant Ingersoll <[email protected]> > wrote: > > Perhaps we should future proof here a little bit and simply have a > classify method that returns a typed object that contains the necessary info > depending on the implementation? Something like: > > ClassifierResult classify() > > > > and then ClassifierResult has an enum or something that indicates whether > one should grab the Vector, Matrix or double. Just brainstorming... > > > > > > On May 20, 2011, at 2:21 PM, Hector Yee wrote: > > > >> Hi, > >> > >> I noticed that classifier has three functions to call to get the score. > >> classify - returns probabilities > >> classifyNoLink - returns the raw score (optional) > >> classifyScalar - returns the binary probability > >> > >> I'm working on a few classifiers for which it doesn't make sense to > return > >> probability. In fact, the probability is just the raw score > exponentiated. > >> This would distort the scores a bit, rather than if the user just used > the > >> raw score directly. Also, if they assume that the scores are really > >> probabilities they may be tempted to use it to compare between two > >> classifiers without previously calibrating on a test set. > >> > >> I wonder if we can add classifiyScalarNoLink and make the NoLinks > >> non-optional. They just return probabilities if you're using a > classifier > >> that returns in the 0-1 range. > >> This way people can choose to use either interface primarily, rather > than > >> calling classify and assume all classifiers support probabilities. > >> > >> Finally, there's some algorithms that can return regression / ranking or > >> classification scores depending on the training data. I was just > planning to > >> return the same value via classifiyScalarNoLink but it seems to be a > poorly > >> named proposed function. I could just name the function 'score' but it > would > >> break the naming convention already set down. > >> > >> Thoughts? > >> > >> -- > >> Yee Yang Li Hector > >> http://hectorgon.blogspot.com/ (tech + travel) > >> http://hectorgon.com (book reviews) > > > > > > >
