L. C. Hsieh created SPARK-55802:
-----------------------------------
Summary: Fix integer overflow when computing Arrow batch bytes
Key: SPARK-55802
URL: https://issues.apache.org/jira/browse/SPARK-55802
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.1, 4.0.2, 4.2.0
Reporter: L. C. Hsieh
We have two SQL configs ARROW_EXECUTION_MAX_BYTES_PER_BATCH and
ARROW_EXECUTION_MAX_BYTES_PER_OUTPUT_BATCH for controlling maximum bytes per
Arrow batches when dealing with Python UDFs. Both are integers. We calculate
the bytes of each column (an integer) and sum them up before comparing the sum
with configured limit. However, it is likely that during summing up we exceed
Integer's maximum (overflow) so the final sum is incorrect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]