Github user sethah commented on the pull request: https://github.com/apache/spark/pull/12663#issuecomment-214790032 Out of curiosity, if or when [SPARK-7126](https://issues.apache.org/jira/browse/SPARK-7126) is implemented, do we plan to remove this behavior? Regarding small datasets and cross validation, it is a bit concerning that the model could get trained with an incorrect number of classes, and since it will happen silently, it could create some confusion. However, I think it is reasonable to expect that end users should realize that some splits of their data could be missing label class values, and without explicitly flagging the number of classes, there is no way for the algorithm to know.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org