Hiroshi Inoue created SPARK-16331:
-------------------------------------

             Summary: [SQL] Reduce code generation time 
                 Key: SPARK-16331
                 URL: https://issues.apache.org/jira/browse/SPARK-16331
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.0, 2.1.0
            Reporter: Hiroshi Inoue


During the code generation, a {{LocalRelation}} often has a huge {{Vector}} 
object as {{data}}. In the simple example below, a {{LocalRelation}} has a 
Vector with 1000000 elements of {{UnsafeRow}}. 

{quote}
val numRows = 1000000
val ds = (1 to numRows).toDS().persist()
benchmark.addCase("filter+reduce") { iter =>
  ds.filter(a => (a & 1) == 0).reduce(_ + _)
}
{quote}

At {{TreeNode.transformChildren}}, all elements of the vector is unnecessarily 
iterated to check whether any children exist in the vector since {{Vector}} is 
Traversable. This part significantly increases code generation time.

This patch avoids this overhead by checking the number of children before 
iterating all elements; {{LocalRelation}} does not have children since it 
extends {{LeafNode}}.

The performance of the above example 
{quote}
without this patch
Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
------------------------------------------------------------------------------------------------
filter+reduce                                 4426 / 4533          0.2        
4426.0       1.0X

with this patch
compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
------------------------------------------------------------------------------------------------
filter+reduce                                 3117 / 3391          0.3        
3116.6       1.0X
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to