[ https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-7754: ----------------------------- Component/s: SQL > Use PartialFunction literals instead of objects in Catalyst > ----------------------------------------------------------- > > Key: SPARK-7754 > URL: https://issues.apache.org/jira/browse/SPARK-7754 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Edoardo Vacchi > > Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and > {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}). > The distinction is fairly subtle: in the end, both rule types are supposed to > define a method {{apply(plan: LogicalPlan)}} > (where LogicalPlan is either Logical- or Spark-) which returns a transformed > plan (or a sequence thereof, in the case > of Strategy). > Ceremonies asides, the body of such method is always of the kind: > {code:java} > def apply(plan: PlanType) = plan match pf > {code} > where `pf` would be some `PartialFunction` of the PlanType: > {code:java} > val pf = { > case ... => ... > } > {code} > This is JIRA is a proposal to introduce utility methods to > a) reduce the boilerplate to define rewrite rules > b) turning them back into what they essentially represent: function types. > These changes would be backwards compatible, and would greatly help in > understanding what the code does. Current use of objects is redundant and > possibly confusing. > *{{Rule[LogicalPlan]}}* > a) Introduce the utility object > {code:java} > object rule > def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): > Rule[LogicalPlan] = > new Rule[LogicalPlan] { > def apply (plan: LogicalPlan): LogicalPlan = plan transform pf > } > def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): > Rule[LogicalPlan] = > new Rule[LogicalPlan] { > override val ruleName = name > def apply (plan: LogicalPlan): LogicalPlan = plan transform pf > } > {code} > b) progressively replace the boilerplate-y object definitions; e.g. > {code:java} > object MyRewriteRule extends Rule[LogicalPlan] { > def apply(plan: LogicalPlan): LogicalPlan = plan transform { > case ... => ... > } > {code} > with > {code:java} > // define a Rule[LogicalPlan] > val MyRewriteRule = rule { > case ... => ... > } > {code} > and/or : > {code:java} > // define a named Rule[LogicalPlan] > val MyRewriteRule = rule.named("My rewrite rule") { > case ... => ... > } > {code} > *Strategies* > A similar solution could be applied to shorten the code for > Strategies, which are total functions > only because they are all supposed to manage the default case, > possibly returning `Nil`. In this case > we might introduce the following utility: > {code:java} > object strategy { > /** > * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. > * The partial function must therefore return *one single* SparkPlan for > each case. > * The method will automatically wrap them in a [[Seq]]. > * Unhandled cases will automatically return Seq.empty > */ > def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty > } > /** > * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] > ]. > * The partial function must therefore return a Seq[SparkPlan] for each > case. > * Unhandled cases will automatically return Seq.empty > */ > def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] > } > } > {code} > Usage: > {code:java} > val mystrategy = strategy { case ... => ... } > val seqstrategy = strategy.seq { case ... => ... } > {code} > *Further possible improvements:* > Making the utility methods `implicit`, thereby > further reducing the rewrite rules to: > {code:java} > // define a PartialFunction[LogicalPlan, LogicalPlan] > // the implicit would convert it into a Rule[LogicalPlan] at the use sites > val MyRewriteRule = { > case ... => ... > } > {code} > *Caveats* > Because of the way objects are initialized vs. vals, it might be necessary > reorder instructions so that vals are actually initialized before they are > used. > E.g.: > {code:java} > class MyOptimizer extends Optimizer { > override val batches: Seq[Batch] = > ... > Batch("Other rules", FixedPoint(100), > MyRewriteRule // <--- might throw NPE > val MyRewriteRule = ... > } > {code} > this is easily fixed by factoring rules out as follows: > {code:java} > class MyOptimizer extends Optimizer with CustomRules { > val MyRewriteRule = ... > override val batches: Seq[Batch] = > ... > Batch("Other rules", FixedPoint(100), > MyRewriteRule > } > {code} > or, even better, further modularizing, e.g using traits or container objects: > {code:java} > class MyOptimizer extends Optimizer with CustomRules { > override val batches: Seq[Batch] = > ... > Batch("Other rules", FixedPoint(100), > MyRewriteRule > } > trait CustomRules { > val MyRewriteRule = ... > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org