Hi godfrey, Thanks for driving this meaningful topic. I think statistics are essential and meaningful for the optimizer, I'm just wondering which situation is needed. From the user side, the optimizer should be executed by the framework, maybe they do not want to consider too much about it. Could you share more situations about using 'ANALYZE TABLE' from the user side?
nit: There maybe exists a mistake in Examples#partition table the partition info should be Partition1: (ds='2022-06-01', hr=1) Partition2: (ds='2022-06-01', hr=2) Partition3: (ds='2022-06-02', hr=1) Partition4: (ds='2022-06-02', hr=2) best zoucao godfrey he <godfre...@gmail.com> 于2022年6月10日周五 15:54写道: > Hi all, > > I would like to open a discussion on FLIP-240: Introduce "ANALYZE > TABLE" Syntax. > > As FLIP-231 mentioned, statistics are one of the most important inputs > to the optimizer. Accurate and complete statistics allows the > optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common > but effective approach to gather statistics, which is already > introduced by many compute engines and databases. > > The main purpose of discussion is to introduce "ANALYZE TABLE" syntax > for Flink sql. > > You can find more details in FLIP-240 document[1]. Looking forward to > your feedback. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481 > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240 > > > Best, > Godfrey >