[ 
https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kris Mok updated SPARK-21041:
-----------------------------
    Description: 
When whole-stage codegen is enabled, in face of integer overflow, 
SparkSession.range()'s behavior is inconsistent with when codegen is turned 
off, while the latter is consistent with SparkContext.range()'s behavior.

The following Spark Shell session shows the inconsistency:
{code:java}
scala> sc.range
   def range(start: Long,end: Long,step: Long,numSlices: Int): 
org.apache.spark.rdd.RDD[Long]

scala> spark.range
                                                                                
                     
def range(start: Long,end: Long,step: Long,numPartitions: Int): 
org.apache.spark.sql.Dataset[Long]   
def range(start: Long,end: Long,step: Long): org.apache.spark.sql.Dataset[Long] 
                     
def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long]            
                     
def range(end: Long): org.apache.spark.sql.Dataset[Long] 

scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res1: Array[Long] = Array()

scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 
9223372036854775806)

scala> spark.conf.set("spark.sql.codegen.wholeStage", false)

scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res5: Array[Long] = Array()
{code}

  was:
When whole-stage codegen is enabled, in face of integer overflow, 
SparkSession.range()'s behavior is inconsistent with when codegen is turned 
off, while the latter is consistent with SparkContext.range()'s behavior.

The following Spark Shell session shows the inconsistency:
{code:scala}
scala> sc.range
   def range(start: Long,end: Long,step: Long,numSlices: Int): 
org.apache.spark.rdd.RDD[Long]

scala> spark.range
                                                                                
                     
def range(start: Long,end: Long,step: Long,numPartitions: Int): 
org.apache.spark.sql.Dataset[Long]   
def range(start: Long,end: Long,step: Long): org.apache.spark.sql.Dataset[Long] 
                     
def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long]            
                     
def range(end: Long): org.apache.spark.sql.Dataset[Long] 

scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res1: Array[Long] = Array()

scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 
9223372036854775806)

scala> spark.conf.set("spark.sql.codegen.wholeStage", false)

scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
1).collect
res5: Array[Long] = Array()
{code}


> With whole-stage codegen, SparkSession.range()'s behavior is inconsistent 
> with SparkContext.range()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21041
>                 URL: https://issues.apache.org/jira/browse/SPARK-21041
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Kris Mok
>
> When whole-stage codegen is enabled, in face of integer overflow, 
> SparkSession.range()'s behavior is inconsistent with when codegen is turned 
> off, while the latter is consistent with SparkContext.range()'s behavior.
> The following Spark Shell session shows the inconsistency:
> {code:java}
> scala> sc.range
>    def range(start: Long,end: Long,step: Long,numSlices: Int): 
> org.apache.spark.rdd.RDD[Long]
> scala> spark.range
>                                                                               
>                        
> def range(start: Long,end: Long,step: Long,numPartitions: Int): 
> org.apache.spark.sql.Dataset[Long]   
> def range(start: Long,end: Long,step: Long): 
> org.apache.spark.sql.Dataset[Long]                      
> def range(start: Long,end: Long): org.apache.spark.sql.Dataset[Long]          
>                        
> def range(end: Long): org.apache.spark.sql.Dataset[Long] 
> scala> sc.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 2, 
> 1).collect
> res1: Array[Long] = Array()
> scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 
> 2, 1).collect
> res2: Array[Long] = Array(9223372036854775804, 9223372036854775805, 
> 9223372036854775806)
> scala> spark.conf.set("spark.sql.codegen.wholeStage", false)
> scala> spark.range(java.lang.Long.MAX_VALUE - 3, java.lang.Long.MIN_VALUE + 
> 2, 1).collect
> res5: Array[Long] = Array()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to