[ https://issues.apache.org/jira/browse/SPARK-43393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-43393: ---------------------------------- Fix Version/s: 3.3.4 > Sequence expression can overflow > -------------------------------- > > Key: SPARK-43393 > URL: https://issues.apache.org/jira/browse/SPARK-43393 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Deepayan Patra > Assignee: Deepayan Patra > Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4 > > > Spark has a (long-standing) overflow bug in the {{sequence}} expression. > > Consider the following operations: > {{spark.sql("CREATE TABLE foo (l LONG);")}} > {{spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")}} > {{spark.sql("SELECT sequence(0, l) FROM foo;").collect()}} > > The result of these operations will be: > {{Array[org.apache.spark.sql.Row] = Array([WrappedArray()])}} > an unintended consequence of overflow. > > The sequence is applied to values {{0}} and {{Long.MaxValue}} with a step > size of {{1}} which uses a length computation defined > [here|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451]. > In this calculation, with {{{}start = 0{}}}, {{{}stop = Long.MaxValue{}}}, > and {{{}step = 1{}}}, the calculated {{len}} overflows to > {{{}Long.MinValue{}}}. The computation, in binary looks like: > {{{{0111111111111111111111111111111111111111111111111111111111111111 -}}}} > {{{{0000000000000000000000000000000000000000000000000000000000000000}}}} > {{{{------------------------------------------------------------------ > 0111111111111111111111111111111111111111111111111111111111111111 /}}}} > {{{{0000000000000000000000000000000000000000000000000000000000000001}}}} > {{{{------------------------------------------------------------------ > 0111111111111111111111111111111111111111111111111111111111111111 +}}}} > {{{{0000000000000000000000000000000000000000000000000000000000000001}}}} > {{{{------------------------------------------------------------------ > 1000000000000000000000000000000000000000000000000000000000000000}}}} > The following > [check|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454] > passes as the negative {{Long.MinValue}} is still {{{}<= > MAX_ROUNDED_ARRAY_LENGTH{}}}. The following cast to {{toInt}} uses this > representation and [truncates the upper > bits|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457] > resulting in an empty length of 0. > Other overflows are similarly problematic. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org