[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kris Mok updated SPARK-21041: ----------------------------- Description: When whole-stage codegen is enabled, in face of integer overflow, SparkSession.range()'s behavior is inconsistent with when codegen is turned off, while the latter is consistent with SparkContext.range()'s behavior. The following Spark Shell session shows the inconsistency: {code:java} scala> sc.range def range(start: Long,end: Long,step: Long,numSlices: Int): org.apache.spark.rdd.RDD[Long] scala> spark.range def range(start: Long,end: Long,step: Long,numPartitions: Int): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long,step: Long): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long] def range(end: Long): org.apache.spark.sql.Dataset[Long] scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res1: Array[Long] = Array() scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 9223372036854775806) scala> spark.conf.set("spark.sql.codegen.wholeStage", false) scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res5: Array[Long] = Array() {code} was: When whole-stage codegen is enabled, in face of integer overflow, SparkSession.range()'s behavior is inconsistent with when codegen is turned off, while the latter is consistent with SparkContext.range()'s behavior. The following Spark Shell session shows the inconsistency: {code:scala} scala> sc.range def range(start: Long,end: Long,step: Long,numSlices: Int): org.apache.spark.rdd.RDD[Long] scala> spark.range def range(start: Long,end: Long,step: Long,numPartitions: Int): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long,step: Long): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long] def range(end: Long): org.apache.spark.sql.Dataset[Long] scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res1: Array[Long] = Array() scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 9223372036854775806) scala> spark.conf.set("spark.sql.codegen.wholeStage", false) scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res5: Array[Long] = Array() {code} > With whole-stage codegen, SparkSession.range()'s behavior is inconsistent > with SparkContext.range() > --------------------------------------------------------------------------------------------------- > > Key: SPARK-21041 > URL: https://issues.apache.org/jira/browse/SPARK-21041 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Kris Mok > > When whole-stage codegen is enabled, in face of integer overflow, > SparkSession.range()'s behavior is inconsistent with when codegen is turned > off, while the latter is consistent with SparkContext.range()'s behavior. > The following Spark Shell session shows the inconsistency: > {code:java} > scala> sc.range > def range(start: Long,end: Long,step: Long,numSlices: Int): > org.apache.spark.rdd.RDD[Long] > scala> spark.range > > > def range(start: Long,end: Long,step: Long,numPartitions: Int): > org.apache.spark.sql.Dataset[Long] > def range(start: Long,end: Long,step: Long): > org.apache.spark.sql.Dataset[Long] > def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long] > > def range(end: Long): org.apache.spark.sql.Dataset[Long] > scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, > 1).collect > res1: Array[Long] = Array() > scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + > 2, 1).collect > res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, > 9223372036854775806) > scala> spark.conf.set("spark.sql.codegen.wholeStage", false) > scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + > 2, 1).collect > res5: Array[Long] = Array() > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org