[
https://issues.apache.org/jira/browse/SPARK-52821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon reassigned SPARK-52821:
------------------------------------
Assignee: Ben Hurdelhey
> Support int to DecimalType return type coercion in Pandas UDFs (useArrow=True)
> ------------------------------------------------------------------------------
>
> Key: SPARK-52821
> URL: https://issues.apache.org/jira/browse/SPARK-52821
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.1.0
> Reporter: Ben Hurdelhey
> Assignee: Ben Hurdelhey
> Priority: Minor
> Labels: pull-request-available
> Attachments: Screenshot 2025-07-16 at 11.49.31.png
>
>
> Problem: pyspark UDFs with useArrow=True do not support type coercion from
> int to DecimalType if the target precision of the DecimalType is too low.
> Example:
> {code:java}
> @udf(returnType=DecimalType(2, 1), useArrow=True)
> def test:
> return 1
> spark.range(1,2,1,1).select(test(col('id'))).display() # expected: (Decimal)
> 1.0
> {code}
> throws
> {code:java}
> pyarrow.lib.ArrowInvalid: Precision is not great enough for the result. It
> should be at least 20{code}
>
> For a better overview of the current behavior, check out this publicly
> available
> [notebook|https://www.databricks.com/wp-content/uploads/notebooks/python-udf-type-coercion.html],
> with the proposed change highlighted in the screenshot.
>
> Proposed solution: Add integer to decimal conversion for pyspark udf return
> types. This is a net-new use case, it was not supported previously (threw an
> error). Thus, this is not a breaking change.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]