Overall this seems like a reasonable proposal to me. Here are a few thoughts:
- There is some debugging utility to the ruleName, so we would probably want to at least make that an argument to the rule function. - We also have had rules that operate on SparkPlan, though since there is only one ATM maybe we don't need sugar there. - I would not call the sugar for creating Strategies rule/seqrule, as I think the one-to-one vs one-to-many distinction is useful. - I'm generally pro-refactoring to make the code nicer, especially when its not official public API, but I do think its important to maintain source compatibility (which I think you are) when possible as there are other projects using catalyst. - Finally, we'll have to balance this with other code changes / conflicts. You should probably open a JIRA and we can continue the discussion there. On Tue, May 19, 2015 at 4:16 AM, Edoardo Vacchi <uncommonnonse...@gmail.com> wrote: > Hi everybody, > > At the moment, Catalyst rules are defined using two different types of > rules: > `Rule[LogicalPlan]` and `Strategy` (which in turn maps to > `GenericStrategy[SparkPlan]`). > > I propose to introduce utility methods to > > a) reduce the boilerplate to define rewrite rules > b) turning them back into what they essentially represent: function > types. > > These changes would be backwards compatible, and would greatly help in > understanding what the code does. Personally, I feel like the current > use of objects is redundant and possibly confusing. > > ## `Rule[LogicalPlan]` > > The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides > defining a default `val ruleName` > only defines the method `apply(plan: TreeType): TreeType`. > Because the body of such method is always supposed to read `plan match > pf`, with `pf` > being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can > conclude that `Rule[LogicalPlan]` > might be substituted by a PartialFunction. > > I propose the following: > > a) Introduce the utility method > > def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): > Rule[LogicalPlan] = > new Rule[LogicalPlan] { > def apply (plan: LogicalPlan): LogicalPlan = plan transform pf > } > > b) progressively replace the boilerplate-y object definitions; e.g. > > object MyRewriteRule extends Rule[LogicalPlan] { > def apply(plan: LogicalPlan): LogicalPlan = plan transform { > case ... => ... > } > > with > > // define a Rule[LogicalPlan] > val MyRewriteRule = rule { > case ... => ... > } > > it might also be possible to make rule method `implicit`, thereby > further reducing MyRewriteRule to: > > // define a PartialFunction[LogicalPlan, LogicalPlan] > // the implicit would convert it into a Rule[LogicalPlan] at the use > sites > val MyRewriteRule = { > case ... => ... > } > > > ## Strategies > > A similar solution could be applied to shorten the code for > Strategies, which are total functions > only because they are all supposed to manage the default case, > possibly returning `Nil`. In this case > we might introduce the following utility methods: > > /** > * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. > * The partial function must therefore return *one single* SparkPlan > for each case. > * The method will automatically wrap them in a [[Seq]]. > * Unhandled cases will automatically return Seq.empty > */ > protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty > } > > /** > * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] > ]. > * The partial function must therefore return a Seq[SparkPlan] for each > case. > * Unhandled cases will automatically return Seq.empty > */ > protected def seqrule(pf: PartialFunction[LogicalPlan, > Seq[SparkPlan]]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] > } > > Thanks in advance > e.v. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >