发件人: Yang Wang <[email protected]> Date: 2025年8月8日周五 11:17 Subject: Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown To: <[email protected]>
Hi Cheng, > Can we clarify that this filter evaluation works on a best-effort basis at the beginning of the FIP document? Specifically, it only performs coarse-grained block skipping by leveraging RecordBatch statistics. To be honest, the table.newScan().filter(recordBatchFilter) API gave me the impression that the server side performs row-by-row filtering. This question relates to what I want to apologize to @HongShun again for, as my reply to his review yesterday was not well considered. I will clarify that the previously designed API: > LogScanner createLogScanner(Predicate recordBatchFilter); It can clearly hint to the user that the filter is responsible for filtering recordBatch only (not at the row level) for log tables. If we use a filter() like Fluent API, we may lead users to misunderstand the real semantics of the interface. Best regards, Yang Wang Cheng <[email protected]> 于2025年8月8日周五 08:38写道: > Thanks Yang for driving this work. > > > Can we clarify that this filter evaluation works on a best-effort basis at > the beginning of the FIP document? Specifically, it only performs > coarse-grained block skipping by leveraging RecordBatch statistics. To > be honest, the table.newScan().filter(recordBatchFilter) API gave me the > impression that the server side performs row-by-row filtering. > > > > Regards, > Cheng > > > > > > > > > ------------------ Original ------------------ > From: > "dev" > < > [email protected]>; > Date: Thu, Aug 7, 2025 11:11 AM > To: "dev"<[email protected]>; > > Subject: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown > > > > Hello Fluss Community, > > I propose initiating discussion on FIP-10: Support Log RecordBatch Filter > Pushdown ( > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown > ). > This optimization aims to improve the performance of Log table queries and > is now ready for community feedback. > > This FIP introduces RecordBatch-level filter pushdown to enable early > filtering at the storage layer, thereby optimizing CPU, memory, and network > resources by skipping non-matching log record batches. > > A proof-of-concept (PoC) has been implemented in the logfilter branch in > https://github.com/platinumhamburg/fluss and is ready for testing and > preview.
