Thanks Yang for driving this work. A greate improvement. Few questions are below:
1. In Migration Strategy partion, it said "Deploy new version with feature flag disabled", seems the filter pushdown is controlled by a option? What's the option looks like, a client or server option? Is it enabled or disabled by default. I haven't seen the option/flag in the FIP. 2. Seems the rpc request `FetchLogRequest` should changes to include `predicates`? Could you describe what changes in `FetchLogRequest`, it's also a part of public interface changes 3. Noticed `PbLiteralValue`/`PbDataType` in Protocol Buffer Definitions, but haven't seen the defination of them, are they missed? 4. Curious about how do you want to serialize the `LogRecordBatchStatistics`, will you reuse the encoding ways of Fluss compacted row or other things Best regards, Yuxia ----- 原始邮件 ----- 发件人: "loserwang1024" <[email protected]> 收件人: "dev" <[email protected]> 发送时间: 星期四, 2025年 8 月 07日 下午 2:18:37 主题: Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown Hi Yang, Thanks for your great work — this change indeed reduces the cost of filtered queries. I just have a few questions for clarification: 1. Fluent API Design for LogScanner Currently, we have: > LogScanner createLogScanner(Predicate recordBatchFilter); > Would it be possible to make the interface more aligned with the fluent design pattern used in Jark’s refactoring?[1] For example: > table.newScan() .project(projectedFields) .filter(recordBatchFilter) > .createLogScanner(); > 2. LogRecordBatchStatistics now supports min, max, and null count, and will be serialized into RecordBatch headers (requiring an upgrade from V1 to V2 format). If we plan to support additional statistics in the future, will we need to upgrade to V3? Or has V2 already been designed with extensibility in mind? 3. When SupportsFilterPushDown#applyFilters pushes filters down to the source, how does the source determine whether a filter can actually be pushed down? Even if the user is on the latest version of Fluss that supports V2 format, existing data might still be in V1 format (which doesn’t include statistics). Will this compatibility issue be handled on the client side? Looking forward to your thoughts! Best Hongshun [1] https://github.com/apache/fluss/issues/340 On Thu, Aug 7, 2025 at 11:12 AM Yang Wang <[email protected]> wrote: > Hello Fluss Community, > > I propose initiating discussion on FIP-10: Support Log RecordBatch Filter > Pushdown ( > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown > ). > This optimization aims to improve the performance of Log table queries and > is now ready for community feedback. > > This FIP introduces RecordBatch-level filter pushdown to enable early > filtering at the storage layer, thereby optimizing CPU, memory, and network > resources by skipping non-matching log record batches. > > A proof-of-concept (PoC) has been implemented in the logfilter branch in > https://github.com/platinumhamburg/fluss and is ready for testing and > preview. >
