Matthew Bedford created SPARK-29235: ---------------------------------------
Summary: CrossValidatorModel.avgMetrics disappears after model is written/read again Key: SPARK-29235 URL: https://issues.apache.org/jira/browse/SPARK-29235 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.1 Environment: Databricks cluster: { "num_workers": 4, "cluster_name": "mabedfor-test-classfix", "spark_version": "5.3.x-cpu-ml-scala2.11", "spark_conf": { "spark.databricks.delta.preview.enabled": "true" }, "node_type_id": "Standard_DS12_v2", "driver_node_type_id": "Standard_DS12_v2", "ssh_public_keys": [], "custom_tags": {}, "spark_env_vars": { "PYSPARK_PYTHON": "/databricks/python3/bin/python3" }, "autotermination_minutes": 120, "enable_elastic_disk": true, "cluster_source": "UI", "init_scripts": [], "cluster_id": "0722-165622-calls746" } Reporter: Matthew Bedford Right after a CrossValidatorModel is trained, it has avgMetrics. After the model is written to disk and read later, it no longer has avgMetrics. To reproduce: {{from pyspark.ml.tuning import CrossValidator, CrossValidatorModel }}{{}} {{cv = CrossValidator(...) #fill with params }}{{}} {{cvModel = cv.fit(trainDF) #given dataframe with training}} {{data}}{{print(cvModel.avgMetrics) #prints a nonempty list as expected}} {{cvModel.write().save({color:#172b4d}"/tmp/model"{color})}} {{cvModel2 = CrossValidatorModel.read().load({color:#172b4d}{color:#172b4d}"/tmp/model"{color}{color})}}{{{color:#172b4d}print(cvModel2.avgMetrics) #BUG - prints an empty list{color}}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org