[ https://issues.apache.org/jira/browse/SPARK-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-28434. ------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25485 [https://github.com/apache/spark/pull/25485] > Decision Tree model isn't equal after save and load > --------------------------------------------------- > > Key: SPARK-28434 > URL: https://issues.apache.org/jira/browse/SPARK-28434 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 2.4.3 > Environment: spark from master > Reporter: Ievgen Prokhorenko > Assignee: Ievgen Prokhorenko > Priority: Minor > Fix For: 3.0.0 > > > The file > `mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala` on > the line no. 628 has a TODO saying: > > {code:java} > // TODO: Check other fields besides the information gain. > {code} > If, in addition to the existing check of InformationGainStats' gain value I > add another check, for instance, impurity – the test fails because the values > are different in the saved model and the one restored from disk. > > See PR with an example. > > The tests are executed with this command: > > {code:java} > build/mvn -e -Dtest=none > -DwildcardSuites=org.apache.spark.mllib.tree.DecisionTreeSuite test{code} > > Excerpts from the output of the command above: > {code:java} > ... > - model save/load *** FAILED *** > checkEqual failed since the two trees were not identical. > TREE A: > DecisionTreeModel classifier of depth 2 with 5 nodes > If (feature 0 <= 0.5) > Predict: 0.0 > Else (feature 0 > 0.5) > If (feature 1 in {0.0,1.0}) > Predict: 0.0 > Else (feature 1 not in {0.0,1.0}) > Predict: 0.0 > TREE B: > DecisionTreeModel classifier of depth 2 with 5 nodes > If (feature 0 <= 0.5) > Predict: 0.0 > Else (feature 0 > 0.5) > If (feature 1 in {0.0,1.0}) > Predict: 0.0 > Else (feature 1 not in {0.0,1.0}) > Predict: 0.0 (DecisionTreeSuite.scala:610) > ...{code} > If I add a little debug info in the `DecisionTreeSuite.checkEqual`: > > {code:java} > val aStats = a.stats > val bStats = b.stats > println(s"id ${a.id} ${b.id}") > println(s"impurity ${aStats.get.impurity} ${bStats.get.impurity}") > println(s"leftImpurity ${aStats.get.leftImpurity} ${bStats.get.leftImpurity}") > println(s"rightImpurity ${aStats.get.rightImpurity} > ${bStats.get.rightImpurity}") > println(s"leftPredict ${aStats.get.leftPredict} ${bStats.get.leftPredict}") > println(s"rightPredict ${aStats.get.rightPredict} ${bStats.get.rightPredict}") > println(s"gain ${aStats.get.gain} ${bStats.get.gain}") > {code} > > Then, in the output of the test command we can see that only values of `gain` > are equal: > > {code:java} > id 1 1 > impurity 0.2 0.5 > leftImpurity 0.3 0.5 > rightImpurity 0.4 0.5 > leftPredict 1.0 (prob = 0.4) 0.0 (prob = 1.0) > rightPredict 0.0 (prob = 0.6) 0.0 (prob = 1.0) > gain 0.1 0.1 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org