Bruce Robbins created SPARK-40403:
-------------------------------------

             Summary: Negative size in error message when unsafe array is too 
big
                 Key: SPARK-40403
                 URL: https://issues.apache.org/jira/browse/SPARK-40403
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Bruce Robbins


When initializing an overly large unsafe array via 
{{UnsafeArrayWriter#initialize}}, {{BufferHolder#grow}} may report an error 
message with a negative size, e.g.:
{noformat}
java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 
-2115263656 because the size is negative
{noformat}
(Note: This is not related to SPARK-39608, as far as I can tell, despite having 
the same symptom).

When calculating the initial size in bytes needed for the array, 
{{UnsafeArrayWriter#initialize}} uses an int expression, which can overflow. 
The initialize method then passes the negative size to {{BufferHolder#grow}}, 
which complains about the negative size.

Example (the following will run just fine on a 16GB laptop, despite the large 
driver size setting):
{noformat}
bin/spark-sql --driver-memory 22g --master "local[1]"

create or replace temp view data1 as
select 0 as key, id as val
from range(0, 268271216);

create or replace temp view data2 as
select key as lkey, collect_list(val) as bigarray
from data1
group by key;

-- the below cache forces Spark to create unsafe rows
cache lazy table data2;

select count(*) from data2;
{noformat}
After a few minutes, {{UnsafeArrayWriter#initialize}} will throw the following 
exception:
{noformat}
java.lang.IllegalArgumentException: Cannot grow BufferHolder by size 
-2115263656 because the size is negative
        at 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67)
        at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:73)
        at 
org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:37)
{noformat}
This query was going to fail anyway, but the message makes it looks like a bug 
in Spark rather than a user problem. {{UnsafeArrayWriter#initialize}} should 
calculate using a long expression and fail if the size exceeds 
{{Integer.MAX_VALUE}}, showing the actual initial size in the error message.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to