On Fri, May 20, 2011 at 11:21 AM, Hector Yee <[email protected]> wrote:
> I'm working on a few classifiers for which it doesn't make sense to return > probability. In fact, the probability is just the raw score exponentiated. > This would distort the scores a bit, rather than if the user just used the > raw score directly. Also, if they assume that the scores are really > probabilities they may be tempted to use it to compare between two > classifiers without previously calibrating on a test set. > I am not too worried about the calibration issue since it is reasonable to handle that with documentation. Returning raw scores without the exponentiation is a natural think to do with the noLink form. > > I wonder if we can add classifiyScalarNoLink and make the NoLinks > non-optional. They just return probabilities if you're using a classifier > that returns in the 0-1 range. > This way people can choose to use either interface primarily, rather than > calling classify and assume all classifiers support probabilities. > I can't tell quiet what you are suggesting here. I think you have the tail of a good idea, but I can't see the spots on it yet. Can you be more concrete about what you are proposing? > > Finally, there's some algorithms that can return regression / ranking or > classification scores depending on the training data. I was just planning > to > return the same value via classifiyScalarNoLink but it seems to be a poorly > named proposed function. I could just name the function 'score' but it > would > break the naming convention already set down. > score is a reasonable name. classifiyScalarNoLink is fairly descriptive if you know the jargon, but score may be better. One problem I have is that people are already using this code in production so name changes are a bit painful. I do think that returning scores without reducing to the 0..1 range is an important operation.
