Hi Yang,

Thanks for your great work — this change indeed reduces the cost of
filtered queries. I just have a few questions for clarification:


1. Fluent API Design for LogScanner Currently, we have:

> LogScanner createLogScanner(Predicate recordBatchFilter);
>
 Would it be possible to make the interface more aligned with the fluent
design pattern used in Jark’s refactoring?[1] For example:

> table.newScan() .project(projectedFields) .filter(recordBatchFilter)
> .createLogScanner();
>

2. LogRecordBatchStatistics now supports min, max, and null count, and will
be serialized into RecordBatch headers (requiring an upgrade from V1 to V2
format). If we plan to support additional statistics in the future, will we
need to upgrade to V3? Or has V2 already been designed with extensibility
in mind?

3. When SupportsFilterPushDown#applyFilters pushes filters down to the
source, how does the source determine whether a filter can actually be
pushed down? Even if the user is on the latest version of Fluss that
supports V2 format, existing data might still be in V1 format (which
doesn’t include statistics). Will this compatibility issue be handled on
the client side?

Looking forward to your thoughts!


Best

Hongshun

[1] https://github.com/apache/fluss/issues/340

On Thu, Aug 7, 2025 at 11:12 AM Yang Wang <[email protected]> wrote:

> Hello Fluss Community,
>
> I propose initiating discussion on FIP-10: Support Log RecordBatch Filter
> Pushdown (
>
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown
> ).
> This optimization aims to improve the performance of Log table queries and
> is now ready for community feedback.
>
> This FIP introduces RecordBatch-level filter pushdown to enable early
> filtering at the storage layer, thereby optimizing CPU, memory, and network
> resources by skipping non-matching log record batches.
>
> A proof-of-concept (PoC) has been implemented in the logfilter branch in
> https://github.com/platinumhamburg/fluss and is ready for testing and
> preview.
>

Reply via email to