Marcus Levine created SPARK-33661: ------------------------------------- Summary: Unable to load RandomForestClassificationModel trained in Spark 2.x Key: SPARK-33661 URL: https://issues.apache.org/jira/browse/SPARK-33661 Project: Spark Issue Type: Bug Components: ML Affects Versions: 3.0.1 Reporter: Marcus Levine
When attempting to load a RandomForestClassificationModel that was trained in Spark 2.x using Spark 3.x, an exception is raised: {code:python} ... RandomForestClassificationModel.load('/path/to/my/model') File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in load File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco File "<string>", line 3, in raise_from pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, prediction, impurity, impurityStats, gain, leftChild, rightChild, split; {code} There seems to be a schema incompatibility between the trained model data saved by Spark 2.x and the expected data for a model trained in Spark 3.x If this issue is not resolved, users will be forced to retrain any existing random forest models they trained in Spark 2.x using Spark 3.x before they can upgrade -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org