WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616356650



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), 
rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       @HyukjinKwon 
   I want to know if no udf return type specified, how does the return type 
inferring work ? Check all rows udf return type ?
   The master code failed in some cases and the return column type in schema 
become "String".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to