[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

mgaido91 Sat, 14 Oct 2017 00:08:40 -0700

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19480#discussion_r144688322
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
    @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22226: splitExpressions should not generate codes beyond 
64KB") {
    +    val colNumber = 10000
    +    val input = spark.range(2).rdd.map(_ => Row(1 to colNumber: _*))
    +    val df = sqlContext.createDataFrame(input, StructType(
    +      (1 to colNumber).map(colIndex => StructField(s"_$colIndex", 
IntegerType, false))))
    +    val newCols = (1 to colNumber).flatMap { colIndex =>
    +      Seq(expr(s"if(1000 < _$colIndex, 1000, _$colIndex)"),
    +        expr(s"sqrt(_$colIndex)"))
    +    }
    +    df.select(newCols: _*).collect()
    +  }
    +
    +  test("SPARK-22226: too many splitted expressions should not exceed 
constant pool limit") {
    --- End diff --
    
    You are right @viirya. Sorry, I didn't notice. Yes the problem is that most 
of the times we have both these issues at the moment, thus solving one is not 
enough. It turns out that there are some corner cases in which this fix is 
enough, like the real case I am working on. But it is not easy to reproduce 
them in a simple way. In this use case there are a lot of complex projections a 
`dropDuplicate` and some joins after that. But there are query made of 
thousands of lines of SQL code.
    The only way I have been able to reproduce it easily is in this test case: 
https://github.com/apache/spark/pull/19480/files#r144302922.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

Reply via email to