[ 
https://issues.apache.org/jira/browse/SPARK-40403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40403:
------------------------------------

    Assignee: Bruce Robbins

> Negative size in error message when unsafe array is too big
> -----------------------------------------------------------
>
>                 Key: SPARK-40403
>                 URL: https://issues.apache.org/jira/browse/SPARK-40403
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Minor
>
> When initializing an overly large unsafe array via 
> {{UnsafeArrayWriter#initialize}}, {{BufferHolder#grow}} may report an error 
> message with a negative size, e.g.:
> {noformat}
> java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 
> -2115263656 because the size is negative
> {noformat}
> (Note: This is not related to SPARK-39608, as far as I can tell, despite 
> having the same symptom).
> When calculating the initial size in bytes needed for the array, 
> {{UnsafeArrayWriter#initialize}} uses an int expression, which can overflow. 
> The initialize method then passes the negative size to {{BufferHolder#grow}}, 
> which complains about the negative size.
> Example (the following will run just fine on a 16GB laptop, despite the large 
> driver size setting):
> {noformat}
> bin/spark-sql --driver-memory 22g --master "local[1]"
> create or replace temp view data1 as
> select 0 as key, id as val
> from range(0, 268271216);
> create or replace temp view data2 as
> select key as lkey, collect_list(val) as bigarray
> from data1
> group by key;
> -- the below cache forces Spark to create unsafe rows
> cache lazy table data2;
> select count(*) from data2;
> {noformat}
> After a few minutes, {{BufferHolder#grow}} will throw the following exception:
> {noformat}
> java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 
> -2115263656 because the size is negative
>       at 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67)
>       at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:73)
>       at 
> org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:37)
> {noformat}
> This query was going to fail anyway, but the message makes it looks like a 
> bug in Spark rather than a user problem. {{UnsafeArrayWriter#initialize}} 
> should calculate using a long expression and fail if the size exceeds 
> {{Integer.MAX_VALUE}}, showing the actual initial size in the error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to