Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

yuxia Thu, 07 Aug 2025 01:56:50 -0700

Thanks Yang for driving this work. A greate improvement. Few questions are 
below:

1. In Migration Strategy partion, it said "Deploy new version with feature flag 
disabled", seems the filter pushdown is controlled by a option? What's the 
option looks like, a client
or server option? Is it enabled or disabled by default. I haven't seen the 
option/flag in the FIP.

2. Seems the rpc request `FetchLogRequest` should changes to include 
`predicates`? Could you describe what changes in `FetchLogRequest`, it's also a 
part of public interface changes

3. Noticed `PbLiteralValue`/`PbDataType` in Protocol Buffer Definitions, but 
haven't seen the defination of them, are they missed? 

4. Curious about how do you want to serialize the `LogRecordBatchStatistics`, 
will you reuse the encoding ways of Fluss compacted row or other things

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "loserwang1024" <[email protected]>
收件人: "dev" <[email protected]>
发送时间: 星期四, 2025年 8 月 07日 下午 2:18:37
主题: Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Hi Yang,

Thanks for your great work — this change indeed reduces the cost of
filtered queries. I just have a few questions for clarification:

1. Fluent API Design for LogScanner Currently, we have:

> LogScanner createLogScanner(Predicate recordBatchFilter);
>
 Would it be possible to make the interface more aligned with the fluent
design pattern used in Jark’s refactoring?[1] For example:

> table.newScan() .project(projectedFields) .filter(recordBatchFilter)
> .createLogScanner();
>

2. LogRecordBatchStatistics now supports min, max, and null count, and will
be serialized into RecordBatch headers (requiring an upgrade from V1 to V2
format). If we plan to support additional statistics in the future, will we
need to upgrade to V3? Or has V2 already been designed with extensibility
in mind?

3. When SupportsFilterPushDown#applyFilters pushes filters down to the
source, how does the source determine whether a filter can actually be
pushed down? Even if the user is on the latest version of Fluss that
supports V2 format, existing data might still be in V1 format (which
doesn’t include statistics). Will this compatibility issue be handled on
the client side?

Looking forward to your thoughts!

Best

Hongshun

[1] https://github.com/apache/fluss/issues/340

On Thu, Aug 7, 2025 at 11:12 AM Yang Wang <[email protected]> wrote:

> Hello Fluss Community,
>
> I propose initiating discussion on FIP-10: Support Log RecordBatch Filter
> Pushdown (
>
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown
> ).
> This optimization aims to improve the performance of Log table queries and
> is now ready for community feedback.
>
> This FIP introduces RecordBatch-level filter pushdown to enable early
> filtering at the storage layer, thereby optimizing CPU, memory, and network
> resources by skipping non-matching log record batches.
>
> A proof-of-concept (PoC) has been implemented in the logfilter branch in
> https://github.com/platinumhamburg/fluss and is ready for testing and
> preview.
>

Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Reply via email to