Fwd: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Yang Wang Thu, 07 Aug 2025 22:48:01 -0700

发件人： Yang Wang <[email protected]>
Date: 2025年8月8日周五 11:17
Subject: Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown
To: <[email protected]>



Hi Cheng,

> Can we clarify that this filter evaluation works on a best-effort basis
at the beginning of the FIP document? Specifically, it only performs
coarse-grained block skipping by leveraging RecordBatch statistics.&nbsp;To
be honest, the table.newScan().filter(recordBatchFilter) API gave me the
impression that the server side performs row-by-row filtering.&nbsp;

This question relates to what I want to apologize to @HongShun again for,
as my reply to his review yesterday was not well considered. I will clarify
that the previously designed API:

> LogScanner createLogScanner(Predicate recordBatchFilter);

It can clearly hint to the user that the filter is responsible for
filtering recordBatch only (not at the row level) for log tables. If we use
a filter() like Fluent API, we may lead users to misunderstand the real
semantics of the interface.

Best regards,
Yang


Wang Cheng <[email protected]> 于2025年8月8日周五 08:38写道：

> Thanks Yang for driving this work.
>
>
> Can we clarify that this filter evaluation works on a best-effort basis at
> the beginning of the FIP document? Specifically, it only performs
> coarse-grained block skipping by leveraging RecordBatch statistics.&nbsp;To
> be honest, the table.newScan().filter(recordBatchFilter) API gave me the
> impression that the server side performs row-by-row filtering.&nbsp;
>
>
>
> Regards,
> Cheng
>
>
>
> &nbsp;
>
>
>
>
> ------------------ Original ------------------
> From:
>                                                   "dev"
>                                                                 <
> [email protected]&gt;;
> Date:&nbsp;Thu, Aug 7, 2025 11:11 AM
> To:&nbsp;"dev"<[email protected]&gt;;
>
> Subject:&nbsp;[DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown
>
>
>
> Hello Fluss Community,
>
> I propose initiating discussion on FIP-10: Support Log RecordBatch Filter
> Pushdown (
>
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown
> ).
> This optimization aims to improve the performance of Log table queries and
> is now ready for community feedback.
>
> This FIP introduces RecordBatch-level filter pushdown to enable early
> filtering at the storage layer, thereby optimizing CPU, memory, and network
> resources by skipping non-matching log record batches.
>
> A proof-of-concept (PoC) has been implemented in the logfilter branch in
> https://github.com/platinumhamburg/fluss and is ready for testing and
> preview.

Fwd: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Reply via email to