Bobby Wang created SPARK-50974:
----------------------------------
Summary: CrossValidator: foldCol is not supported
Key: SPARK-50974
URL: https://issues.apache.org/jira/browse/SPARK-50974
Project: Spark
Issue Type: Sub-task
Components: Connect, ML, PySpark
Affects Versions: 4.0.0, 4.1
Reporter: Bobby Wang
error msg:
cvModel2 = cv_with_user_folds.fit(dataset_with_folds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/work.d/spark/spark-master/python/pyspark/ml/base.py", line
203, in fit
return self._fit(dataset)
^^^^^^^^^^^^^^^^^^
File "/home/xxx/work.d/spark/spark-master/python/pyspark/ml/tuning.py", line
848, in _fit
datasets = self._kFold(dataset)
^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/work.d/spark/spark-master/python/pyspark/ml/tuning.py", line
906, in _kFold
training = dataset.filter(checker_udf(dataset[foldCol]) & (col(foldCol) !=
lit(i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/work.d/spark/spark-master/python/pyspark/sql/udf.py", line
405, in __call__
jcols = [_to_java_column(arg) for arg in args] + [
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/work.d/spark/spark-master/python/pyspark/sql/udf.py", line
405, in <listcomp>
jcols = [_to_java_column(arg) for arg in args] + [
^^^^^^^^^^^^^^^^^^^^
File
"/home/xxx/work.d/spark/spark-master/python/pyspark/sql/classic/column.py",
line 71, in _to_java_column
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument
`col` should be a Column or str, got Column.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]