[ https://issues.apache.org/jira/browse/SPARK-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-16331. --------------------------------- Resolution: Fixed Assignee: Hiroshi Inoue Fix Version/s: 2.1.0 > [SQL] Reduce code generation time > ---------------------------------- > > Key: SPARK-16331 > URL: https://issues.apache.org/jira/browse/SPARK-16331 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0, 2.1.0 > Reporter: Hiroshi Inoue > Assignee: Hiroshi Inoue > Fix For: 2.1.0 > > > During the code generation, a {{LocalRelation}} often has a huge {{Vector}} > object as {{data}}. In the simple example below, a {{LocalRelation}} has a > Vector with 1000000 elements of {{UnsafeRow}}. > {quote} > val numRows = 1000000 > val ds = (1 to numRows).toDS().persist() > benchmark.addCase("filter+reduce") { iter => > ds.filter(a => (a & 1) == 0).reduce(_ + _) > } > {quote} > At {{TreeNode.transformChildren}}, all elements of the vector is > unnecessarily iterated to check whether any children exist in the vector > since {{Vector}} is Traversable. This part significantly increases code > generation time. > This patch avoids this overhead by checking the number of children before > iterating all elements; {{LocalRelation}} does not have children since it > extends {{LeafNode}}. > The performance of the above example > {quote} > without this patch > Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5 > Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz > compilationTime: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > filter+reduce 4426 / 4533 0.2 > 4426.0 1.0X > with this patch > compilationTime: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > filter+reduce 3117 / 3391 0.3 > 3116.6 1.0X > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org