L. C. Hsieh created SPARK-55802:
-----------------------------------

             Summary: Fix integer overflow when computing Arrow batch bytes
                 Key: SPARK-55802
                 URL: https://issues.apache.org/jira/browse/SPARK-55802
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.1, 4.0.2, 4.2.0
            Reporter: L. C. Hsieh


We have two SQL configs ARROW_EXECUTION_MAX_BYTES_PER_BATCH and 
ARROW_EXECUTION_MAX_BYTES_PER_OUTPUT_BATCH for controlling maximum bytes per 
Arrow batches when dealing with Python UDFs. Both are integers. We calculate 
the bytes of each column (an integer) and sum them up before comparing the sum 
with configured limit. However, it is likely that during summing up we exceed 
Integer's maximum (overflow) so the final sum is incorrect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to