Perhaps we should future proof here a little bit and simply have a classify method that returns a typed object that contains the necessary info depending on the implementation? Something like: ClassifierResult classify()
and then ClassifierResult has an enum or something that indicates whether one should grab the Vector, Matrix or double. Just brainstorming... On May 20, 2011, at 2:21 PM, Hector Yee wrote: > Hi, > > I noticed that classifier has three functions to call to get the score. > classify - returns probabilities > classifyNoLink - returns the raw score (optional) > classifyScalar - returns the binary probability > > I'm working on a few classifiers for which it doesn't make sense to return > probability. In fact, the probability is just the raw score exponentiated. > This would distort the scores a bit, rather than if the user just used the > raw score directly. Also, if they assume that the scores are really > probabilities they may be tempted to use it to compare between two > classifiers without previously calibrating on a test set. > > I wonder if we can add classifiyScalarNoLink and make the NoLinks > non-optional. They just return probabilities if you're using a classifier > that returns in the 0-1 range. > This way people can choose to use either interface primarily, rather than > calling classify and assume all classifiers support probabilities. > > Finally, there's some algorithms that can return regression / ranking or > classification scores depending on the training data. I was just planning to > return the same value via classifiyScalarNoLink but it seems to be a poorly > named proposed function. I could just name the function 'score' but it would > break the naming convention already set down. > > Thoughts? > > -- > Yee Yang Li Hector > http://hectorgon.blogspot.com/ (tech + travel) > http://hectorgon.com (book reviews)
