Marcus Levine created SPARK-33661:
-------------------------------------

             Summary: Unable to load RandomForestClassificationModel trained in 
Spark 2.x
                 Key: SPARK-33661
                 URL: https://issues.apache.org/jira/browse/SPARK-33661
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 3.0.1
            Reporter: Marcus Levine


When attempting to load a RandomForestClassificationModel that was trained in 
Spark 2.x using Spark 3.x, an exception is raised:

{code:python}
...
    RandomForestClassificationModel.load('/path/to/my/model')
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in 
load
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load
  File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1305, in __call__
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in 
deco
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, 
prediction, impurity, impurityStats, gain, leftChild, rightChild, split;
{code}

There seems to be a schema incompatibility between the trained model data saved 
by Spark 2.x and the expected data for a model trained in Spark 3.x

If this issue is not resolved, users will be forced to retrain any existing 
random forest models they trained in Spark 2.x using Spark 3.x before they can 
upgrade



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to