Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

godfrey he Sun, 12 Jun 2022 23:43:28 -0700

Hi Ingo,

The semantics does not distinguish batch and streaming,
It works for both batch and streaming, but the result of
unbounded sources is meaningless.
Currently, I throw exception for streaming mode,
and we can support streaming mode with bounded source
in the future.


Best,
Godfrey

Ingo Bürk <airbla...@apache.org> 于2022年6月13日周一 14:17写道：
>
> Hi Godfrey,
>
> thank you for the explanation. A SELECT is definitely more generic and
> will work for all connectors automatically. As such I think it's a good
> baseline solution regardless.
>
> We can also think about allowing connector-specific optimizations in the
> future, but I do like your idea of letting the optimizer rules perform a
> lot of the work here already by leveraging existing optimizations.
> Similarly things like non-null counts of non-nullable columns would (or
> at least could) be handled by the optimizer rules already.
>
> So as far as that point goes, +1 to the generic approach.
>
> One more point, though: In general we should avoid supporting features
> only in specific modes as it breaks the unification promise. Given that
> ANALYZE is a manual and completely optional operation I'm OK with doing
> that here in principle. However, I wonder what will happen in the
> streaming / unbounded case. Do you plan to throw an error? Or do we
> complete the command as successful but without doing anything?
>
>
> Best
> Ingo
>
> On 13.06.22 05:50, godfrey he wrote:
> > Hi Ingo,
> >
> > Thanks for the inputs.
> >
> > I think converting `ANALYZE TABLE` to `SELECT` statement is
> > more generic approach. Because query plan optimization is more generic,
> >   we can provide more optimization rules to optimize not only `SELECT` 
> > statement
> > converted from `ANALYZE TABLE` but also the `SELECT` statement written by 
> > users.
> >
> >> JDBC connector can get a row count estimate without performing a
> >> SELECT COUNT(1)
> > To optimize such cases, we can implement a rule to push aggregate into
> > table source.
> > Currently, there is a similar rule: SupportsAggregatePushDown, which
> > supports only pushing
> > local aggregate into source now.
> >
> >
> > Best,
> > Godfrey
> >
> > Ingo Bürk <airbla...@apache.org> 于2022年6月10日周五 17:15写道：
> >>
> >> Hi Godfrey,
> >>
> >> compared to the solution proposed in the FLIP (using a SELECT
> >> statement), I wonder if you have considered adding APIs to catalogs /
> >> connectors to perform this task as an alternative?
> >> I could imagine that for many connectors, statistics could be
> >> implemented in a less expensive way by leveraging the underlying system
> >> (e.g. a JDBC connector can get a row count estimate without performing a
> >> SELECT COUNT(1)).
> >>
> >>
> >> Best
> >> Ingo
> >>
> >>
> >> On 10.06.22 09:53, godfrey he wrote:
> >>> Hi all,
> >>>
> >>> I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
> >>> TABLE" Syntax.
> >>>
> >>> As FLIP-231 mentioned, statistics are one of the most important inputs
> >>> to the optimizer. Accurate and complete statistics allows the
> >>> optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
> >>> but effective approach to gather statistics, which is already
> >>> introduced by many compute engines and databases.
> >>>
> >>> The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
> >>> for Flink sql.
> >>>
> >>> You can find more details in FLIP-240 document[1]. Looking forward to
> >>> your feedback.
> >>>
> >>> [1] 
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> >>> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240
> >>>
> >>>
> >>> Best,
> >>> Godfrey

Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

Reply via email to