dtenedor commented on code in PR #43356:
URL: https://github.com/apache/spark/pull/43356#discussion_r1364492809


##########
python/pyspark/worker.py:
##########
@@ -841,6 +845,63 @@ def _remove_partition_by_exprs(self, arg: Any) -> Any:
             "the query again."
         )
 
+    # Compares each UDTF output row against the output schema for this 
particular UDTF call,
+    # raising an error if the two are incompatible.
+    def check_output_row_against_schema(row: Any) -> None:

Review Comment:
   Note: In a previous iteration of this PR, I had a check to see if the schema 
contained any non-nullable columns in order to enable this. However, I would 
like to extend these checks to compare provided row values against the expected 
output schema column types, which currently produce internal exceptions instead 
of good error messages if they don't match. We would need to check every value 
in every row in that case, so I figured it was OK to just do that here as well.



##########
python/pyspark/worker.py:
##########
@@ -841,6 +845,63 @@ def _remove_partition_by_exprs(self, arg: Any) -> Any:
             "the query again."
         )
 
+    # Compares each UDTF output row against the output schema for this 
particular UDTF call,
+    # raising an error if the two are incompatible.
+    def check_output_row_against_schema(row: Any) -> None:

Review Comment:
   Note: In a previous iteration of this PR, I had a check to see if the schema 
contained any non-nullable columns in order to enable this. However, I would 
like to later extend these checks to compare provided row values against the 
expected output schema column types, which currently produce internal 
exceptions instead of good error messages if they don't match. We would need to 
check every value in every row in that case, so I figured it was OK to just do 
that here as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to