Hi Jingsong,
Thanks for your comments, here are some responses: > But also need to re-calculate partition col stats? No conclusion yet, will refer to the design of parquet when integrating with Spark. It will be added back to the PIP at that time. > FileStoreCommit.commitStatistics will fix. > Why nullable? Should it be long? will fix. Best, zouxxyy On 2024/01/12 07:36:26 Jingsong Li wrote: > Thanks zouxxy for starting this discussion. > > The design looks good to me overall. > > Left some comments: > > > Statistics in snapshot (global stats + col stats) with some prunning > > strategies > stats calculated in real-time from splits (only including > > numRows and totalSize) > > But also need to re-calculate partition col stats? > > > FileStoreCommit.writeStats > > FileStoreCommit.commitStatistics > > > Long snapshotId in Stats > > Why nullable? Should it be long? > > Best, > Jingsong > > On Fri, Jan 12, 2024 at 3:01 PM zouxxyy <[email protected]> wrote: > > > > Hi, Paimon Devs, I’d like to start a discussion about PIP-14[1]. > > > > Table statistics describe the data distribution characteristics of a table. > > Common statistics include the number of rows, table size, column statistics > > and more. > > They are very important for DBMS, especially when executing query plans and > > optimizing query performance. > > This PIP further expand on the existing statistics of Paimon to support > > more statistical information. > > > > Look forward to your question and suggestions. > > > > Best, zouxxyy > > > > [1] https://cwiki.apache.org/confluence/x/HYokEQ > > >
