[ https://issues.apache.org/jira/browse/SPARK-25379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-25379: ----------------------------------- Assignee: Marco Gaido > Improve ColumnPruning performance > --------------------------------- > > Key: SPARK-25379 > URL: https://issues.apache.org/jira/browse/SPARK-25379 > Project: Spark > Issue Type: Sub-task > Components: Optimizer, SQL > Affects Versions: 2.4.0 > Reporter: Marco Gaido > Assignee: Marco Gaido > Priority: Major > Fix For: 2.5.0 > > > The {{--}} operation on {{AttributeSet}} is quite expensive, especially where > many columns are involved. {{ColumnPruning}} heavily relies on that operator > and this affects its running time. There are 2 optimization which are > possible: > - Improve {{--}} performance; > - Replace {{--}} with {{subsetOf}} when possible. > Moreover, when building {{AttributeSet}} s we often do unneeded operations. > This also impacts other rules less significantly. > I'll provide more details about the performance improvement achievable in the > PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org