Re: [DISCUSS] FIP-11: Fluss Python binding on Fluss Rust

Jim Hu Tue, 12 Aug 2025 06:49:59 -0700

Hi, Jark.

Thanks for your detailed review. I'll address all of your concerns.

1.  The FIP’s specification intentionally outlines the complete
interface—including Arrow, Pandas, and DuckDB helpers—to ensure forward
compatibility as the client evolves. The initial release, however, will be
limited to bounded batch operations; it will expose only the scan method
with an explicit end_timestamp and the corresponding to_arrow and to_pandas
conversions.

2. AppendWriter.write_arrow is asynchronous. It returns a coroutine that
completes only after the Arrow table is durably appended, similar to a Java
CompletableFuture.

3. scan returns a lazy iterator that never materialises the entire log.
to_arrow() collects the iterator into a single PyArrow Table—use it only
when the data is small enough to fit in memory.
to_arrow_batch() yields a RecordBatchReader that streams batches
incrementally, keeping memory pressure low and matching the underlying Rust
scanner’s streaming semantics.

4. The second parametre was intended to be a timeout in ms. However, after
reviewing, we only have these configurations: bootstrap_server,
request_max_size, writer_acks, writer_retries, and writer_batch_size. I
have revised the Python demo to only contain the bootstrap_server.

Best,
Jim

Jark Wu <[email protected]> 于2025年8月12日周二 11:33写道：

> Thanks Jim for proposing the python and rust client.
>
> The overall design looks good to me. +1 for Rust bindings (via PyO3 or
> similar) over solutions like Py4J or a separate Python-native
> implementation.
>
> This aligns with the current industry standard for high-performance Python
> integrations, like Lance, Iceberg, and DeltaLake. Leveraging Rust bindings
> will not only deliver better performance but also allow us to maximize code
> reuse from the existing Rust client, reducing maintenance overhead and
> ensuring consistency across language interfaces.
>
> Regarding the FIP, I have some following comments:
>
> 1. Could you clarify the intended scope of this FIP? Specifically, does it
> include integration with PyArrow, Pandas, and DuckDB? The example shows
> DuckDB integration, but it’s also listed under Future Work.
>
> 2. What's the semantics of AppendWriter#write_arrow? Is it a blocking API
> or a non-blocking API? If it is a non-blocking API, can we have something
> like Java CompletableFuture as the return type to allow users when the call
> is successful?
>
> 3. What's the behavior of LogScanner#scan_earliest? Does it load all data
> into client memory upfront and return a fully materialized ScanResult, or
> does it support streaming semantics, fetching data incrementally to reduce
> memory pressure?
>
> 4. What's the interface of `fluss.Config`? The config = fluss.Config("
> 127.0.0.1:9123", 30) in the example confuses me about what the parameters
> mean?
>
>
> Best,
> Jark
>
>
>
> On Fri, 8 Aug 2025 at 20:03, Jim Hu <[email protected]> wrote:
>
> > Hi Fluss community,
> > I’d like to kick off the public discussion on FIP-11¹, which proposes a
> > thin PyO3 wrapper around the existing Rust client so that Python users
> can
> > read/write Fluss without spinning up a JVM.
> >
> >    - Goal: Java-SDK parity (streaming & batch reads/writes, DDL, point
> >    queries) but 100 % Pythonic.
> >    - Implementation: PyO3 bridge compiled to a native .so/.pyd;
> zero-copy,
> >    lock-free, no sockets, no JVM start-up cost.
> >    - Repository: fluss-rust/bindings/python.
> >
> > I’ve prototyped a minimal PoC at naivedogger/Fluss-Python-Client at
> > python-client
> > <https://github.com/naivedogger/Fluss-Python-Client/tree/python-client>
> > Looking forward to your thoughts.
> > Best regards,
> > Jim
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-11%3A+Fluss+Python+binding+on+Fluss+Rust
> >
>

Re: [DISCUSS] FIP-11: Fluss Python binding on Fluss Rust

Reply via email to