Hi Julian,

I agree that in your example normalization may have some different concerns
comparing to simplification. However, both normalization and simplification
sometimes address similar problems either. For example, the simplification
may decrease the search space, but so does the normalization. E.g.
normalized reordering of operands in a join condition may allow for the
merge of equivalent nodes that otherwise would be considered
non-equivalent. Do any of the currently implemented rules depend on some
normalized representation?

Also, as many rules (such as join reorder rules) generate filters, I would
argue that moving the normalization to a separate phase might cause the
unnecessary expansion of the search space.

The idea I expressed above is inspired by CockroachDB (again :-)). In
CockroachDB, expressions are part of the MEMO and treated similarly to
relational operators, which allows for the unified rule infrastructure for
both operators and expressions. Expressions are created using a
context-aware builder, which knows the set of active normalization rules.
Whenever a builder is to create a new expression (not necessarily
the top-level), the normalization rules are invoked in a heuristic manner.
The code generation is used to build the heuristic rule executor. Both
normalization and simplification (in our terms) rules are invoked here. For
example, see [1] (normalization) and [2] (simplification). Finally, the
expression is registered in MEMO. As a result, every expression ever
produced is always in a normalized/simplified form.

I am not saying that we should follow this approach. But IMO (1) unified
handling of simplification and normalization through rules and (2) a single
entry point for all normalization (builder) are interesting design
decisions, as they offer both flexibility and convenience.

Regards,
Vladimir.

[1]
https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8
[2]
https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30

пт, 12 мар. 2021 г. в 07:15, Julian Hyde <jhyde.apa...@gmail.com>:

> Without simplifications, many trivial RelNodes would be produced. It is
> beneficial to have those in RelBuilder; if they were in rules, the trivial
> RelNodes (and equivalence sets) would still be present, increasing the size
> of the search space.
>
> I want to draw a distinction between simplification and normalization. A
> normalized form is relied upon throughout the system. Suppose for example,
> that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef =
> RexLiteral’. If a rule encountered the latter case, it would not be a bug
> if the rule failed with, say, a ClassCastException.
>
> So, I disagree with Vladimir that 'RexSimplify may also be considered a
> “normalization”’. If simplification is turned off, each rule must be able
> to deal with the unsimplified expressions.
>
> Also, the very idea of normalizations being optional, enabled by system
> properties or other config, is rather disturbing, because the rules
> probably don’t know that the normalization has been turned off.
>
> The only place for normalization, in my opinion, is explicitly, in a
> particular planner phase. For example, pulling up all filters before
> attempting to match materialized views.
>
> Julian
>
> > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov <ppoze...@gmail.com>
> wrote:
> >
> > in our practice, we also had some problems with normalization. First, we
> > observed problems with the unwanted (and sometimes
> > incorrect) simplification of expressions with CASTs and literals which
> came
> > from RexSimplify. I couldn't find an easy way to disable that behavior.
> > Note, that RexSimplify may also be considered a "normalization". Second,
> we
> > implemented a way to avoid Project when doing join reordering but had
> some
> > issues with operator signatures due to lack of automatic normalization
> for
> > expressions for permuted inputs. These two cases demonstrate two opposite
> > views: sometimes you want a specific normalization to happen
> automatically,
> > but sometimes you want to disable it.
> >
> > Perhaps an alternative approach could be to unify all simplification and
> > normalization logic and split it into configurable rules. Then, we may
> add
> > these rules as a separate rule set to the planner, which would be invoked
> > heuristically every time an operator with expressions is registered in
> > MEMO. In this case, a user would not need to bother about RexNode
> > constructors. To clarify, under "rules" I do not mean heavy-weight rules
> > similar to normal rules. Instead, it might be simple pattern+method
> pairs,
> > that could even be compiled into a static program using Janino. This
> > approach could be very flexible and convenient: a single place in the
> code
> > where all rewrite happens, complete control of the optimization rules,
> > modular rules instead of monolithic code (like in RexSimplify). The
> obvious
> > downside - it would require more time to implement than other proposed
> > approaches.
> >
> > What do you think about that?
> >
> > Regards,
> > Vladimir.
> >
> > чт, 11 мар. 2021 г. в 13:33, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com
> >> :
> >
> >> Stamatis>just the option to use it or not in a more friendly way
> >> Stamatis>than a system property.
> >>
> >> As far as I remember, the key issue here is that new RexBuilder(...) is
> a
> >> quite common pattern,
> >> and what you suggest looks like "everyone would have to provide extra
> >> argument when creating RexBuilder".
> >>
> >> On top of that, there are use cases like "new RexCall(...)" in the
> static
> >> context (see org.apache.calcite.rex.RexUtil#not).
> >>
> >> Making the uses customizable adds significant overhead with doubtful
> gains.
> >>
> >> I have not explored the route though, so there might be solutions.
> >> For instance, it might work if we have an in-core dependency injection
> that
> >> would hide the complexity
> >> when coding :core, however, I don't think we could expose DI to Calcite
> >> users.
> >>
> >> Vladimir
> >>
>
>

Reply via email to