Github user srowen commented on the issue: https://github.com/apache/spark/pull/14949 Oh, I get it now. That makes sense. If this were being applied to decision trees only, that would make sense and we could fix this up and document the meaning. I agree it only makes sense to return "no class" if actually thresholding. The only problem here is that this is not being applied just to a random forest implementation but to all classifiers that output a probability. That's a little more of a stretch. I suppose the result here can be thought of as a likelihood ratio of class probability vs prior, not some hacky heuristic specific to the CRAN package. I think the name is unfortunate because I would not have guessed that's the meaning given the name (though to be fair the scaladoc does say what it means). I'll close this but what's the best way forward? Option 1. Keep current behavior. Modify https://github.com/apache/spark/pull/14643 to include Nick's suggestions above, and add a bunch of documentation about what 'thresholds' really means here. Option 2. As above but deprecate threshold and rename to 'cutoff' to be a little clearer. Option 3. As in Option 2 but also go back and actually implement thresholds.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org