Kris Mok created SPARK-21041: -------------------------------- Summary: With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range() Key: SPARK-21041 URL: https://issues.apache.org/jira/browse/SPARK-21041 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Kris Mok
When whole-stage codegen is enabled, in face of integer overflow, SparkSession.range()'s behavior is inconsistent with when codegen is turned off, while the latter is consistent with SparkContext.range()'s behavior. The following Spark Shell session shows the inconsistency: {code:scala} scala> sc.range def range(start: Long,end: Long,step: Long,numSlices: Int): org.apache.spark.rdd.RDD[Long] scala> spark.range def range(start: Long,end: Long,step: Long,numPartitions: Int): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long,step: Long): org.apache.spark.sql.Dataset[Long] def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long] def range(end: Long): org.apache.spark.sql.Dataset[Long] scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res1: Array[Long] = Array() scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 9223372036854775806) scala> spark.conf.set("spark.sql.codegen.wholeStage", false) scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 1).collect res5: Array[Long] = Array() {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org