[jira] [Resolved] (SPARK-21603) The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long

Xiao Li (JIRA) Wed, 16 Aug 2017 09:14:42 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiao Li resolved SPARK-21603.
-----------------------------
       Resolution: Fixed
         Assignee: eaton
    Fix Version/s: 2.3.0

> The wholestage codegen will be much slower then wholestage codegen is closed 
> when the function is too long
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21603
>                 URL: https://issues.apache.org/jira/browse/SPARK-21603
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: eaton
>            Assignee: eaton
>             Fix For: 2.3.0
>
>
> A benchmark test result is 10x slower when the generated function is too long 
> :
> ignore("max function length of wholestagecodegen") {
>     val N = 20 << 15
>     val benchmark = new Benchmark("max function length of wholestagecodegen", 
> N)
>     def f(): Unit = sparkSession.range(N)
>       .selectExpr(
>         "id",
>         "(id & 1023) as k1",
>         "cast(id & 1023 as double) as k2",
>         "cast(id & 1023 as int) as k3",
>         "case when id > 100 and id <= 200 then 1 else 0 end as v1",
>         "case when id > 200 and id <= 300 then 1 else 0 end as v2",
>         "case when id > 300 and id <= 400 then 1 else 0 end as v3",
>         "case when id > 400 and id <= 500 then 1 else 0 end as v4",
>         "case when id > 500 and id <= 600 then 1 else 0 end as v5",
>         "case when id > 600 and id <= 700 then 1 else 0 end as v6",
>         "case when id > 700 and id <= 800 then 1 else 0 end as v7",
>         "case when id > 800 and id <= 900 then 1 else 0 end as v8",
>         "case when id > 900 and id <= 1000 then 1 else 0 end as v9",
>         "case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
>         "case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
>         "case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
>         "case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
>         "case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
>         "case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
>         "case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
>         "case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
>         "case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
>       .groupBy("k1", "k2", "k3")
>       .sum()
>       .collect()
>     benchmark.addCase(s"codegen = F") { iter =>
>       sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
>       f()
>     }
>     benchmark.addCase(s"codegen = T") { iter =>
>       sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
>       sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "10000")
>       f()
>     }
>     benchmark.run()
>     /*
>     Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 on Windows 7 6.1
>     Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
>     max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)  
>  Per Row(ns)   Relative
>     
> ------------------------------------------------------------------------------------------------
>     codegen = F                                    443 /  507          1.5    
>      676.0       1.0X
>     codegen = T                                   3279 / 3283          0.2    
>     5002.6       0.1X
>      */
>   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-21603) The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long

Reply via email to