[ https://issues.apache.org/jira/browse/SPARK-40403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-40403: ------------------------------------ Assignee: Apache Spark > Negative size in error message when unsafe array is too big > ----------------------------------------------------------- > > Key: SPARK-40403 > URL: https://issues.apache.org/jira/browse/SPARK-40403 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Bruce Robbins > Assignee: Apache Spark > Priority: Minor > > When initializing an overly large unsafe array via > {{UnsafeArrayWriter#initialize}}, {{BufferHolder#grow}} may report an error > message with a negative size, e.g.: > {noformat} > java.lang.IllegalArgumentException: Cannot grow BufferHolder by size > -2115263656 because the size is negative > {noformat} > (Note: This is not related to SPARK-39608, as far as I can tell, despite > having the same symptom). > When calculating the initial size in bytes needed for the array, > {{UnsafeArrayWriter#initialize}} uses an int expression, which can overflow. > The initialize method then passes the negative size to {{BufferHolder#grow}}, > which complains about the negative size. > Example (the following will run just fine on a 16GB laptop, despite the large > driver size setting): > {noformat} > bin/spark-sql --driver-memory 22g --master "local[1]" > create or replace temp view data1 as > select 0 as key, id as val > from range(0, 268271216); > create or replace temp view data2 as > select key as lkey, collect_list(val) as bigarray > from data1 > group by key; > -- the below cache forces Spark to create unsafe rows > cache lazy table data2; > select count(*) from data2; > {noformat} > After a few minutes, {{BufferHolder#grow}} will throw the following exception: > {noformat} > java.lang.IllegalArgumentException: Cannot grow BufferHolder by size > -2115263656 because the size is negative > at > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:73) > at > org.apache.spark.sql.catalyst.expressions.aggregate.Collect.serialize(collect.scala:37) > {noformat} > This query was going to fail anyway, but the message makes it looks like a > bug in Spark rather than a user problem. {{UnsafeArrayWriter#initialize}} > should calculate using a long expression and fail if the size exceeds > {{Integer.MAX_VALUE}}, showing the actual initial size in the error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org