Elisabeth Niederbacher created SPARK-44848: ----------------------------------------------
Summary: MLlib GBTClassifier has wrong impurity method 'variance' instead of 'gini' or 'entropy'. Key: SPARK-44848 URL: https://issues.apache.org/jira/browse/SPARK-44848 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 3.4.1 Reporter: Elisabeth Niederbacher Impurity method 'variance' should only be used for regressors, *not* classifiers. For classifiers gini and entropy should be available as it is already the case for the RandomForestClassifier [https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.ml.classification.RandomForestClassifier.html] . Because of this bug 'minInfoGain' hyperparameter cannot be tuned to combat overfitting. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org