Hi everybody,
At the moment, Catalyst rules are defined using two different types of rules:
`Rule[LogicalPlan]` and `Strategy` (which in turn maps to
`GenericStrategy[SparkPlan]`).
I propose to introduce utility methods to
a) reduce the boilerplate to define rewrite rules
b) turning them back into what they essentially represent: function types.
These changes would be backwards compatible, and would greatly help in
understanding what the code does. Personally, I feel like the current
use of objects is redundant and possibly confusing.
## `Rule[LogicalPlan]`
The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides
defining a default `val ruleName`
only defines the method `apply(plan: TreeType): TreeType`.
Because the body of such method is always supposed to read `plan match
pf`, with `pf`
being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can
conclude that `Rule[LogicalPlan]`
might be substituted by a PartialFunction.
I propose the following:
a) Introduce the utility method
def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
new Rule[LogicalPlan] {
def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
}
b) progressively replace the boilerplate-y object definitions; e.g.
object MyRewriteRule extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case ... => ...
}
with
// define a Rule[LogicalPlan]
val MyRewriteRule = rule {
case ... => ...
}
it might also be possible to make rule method `implicit`, thereby
further reducing MyRewriteRule to:
// define a PartialFunction[LogicalPlan, LogicalPlan]
// the implicit would convert it into a Rule[LogicalPlan] at the use sites
val MyRewriteRule = {
case ... => ...
}
## Strategies
A similar solution could be applied to shorten the code for
Strategies, which are total functions
only because they are all supposed to manage the default case,
possibly returning `Nil`. In this case
we might introduce the following utility methods:
/**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan
for each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
}
/**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
* The partial function must therefore return a Seq[SparkPlan] for each case.
* Unhandled cases will automatically return Seq.empty
*/
protected def seqrule(pf: PartialFunction[LogicalPlan,
Seq[SparkPlan]]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
}
Thanks in advance
e.v.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]