Hi Anton,
thanks for the review, These are great thoughts worth thinking through
carefully. I want to engage with each point directly, but also raise a
concern about where I think these is a bar for a blocking this
implementation given where we are in the process of the vote.

*On the session lifecycle machinery being a burden*
The ScannerManager with TTL eviction, per-bucket/server limits,
leadership-change cleanup is pretty straightforward. Fluss already carries
analogous lifecycle machinery for log fetch sessions and snapshot resource
leases. I don't think the operational cost here is qualitatively different
from what we already manage. It is well-understood, it has tests, and the
configuration surface is minimal (kv.scanner.ttl,
kv.scanner.max-per-bucket, kv.scanner.max-per-server). I would not call the
presence of this machinery a design flaw.

*On the continuation-token / Cassandra-style alternative*
 This is a great suggestion and I want to give it a fair hearing. You are
correct that an opaque server-produced token shifts the client-side
contract in a nice way, the client never constructs or interprets position
bytes, it just echoes them back. This could be a potential ergonomic
improvement.

However, for Fluss's primary scan use cases the token approach has a
non-trivial correctness cost:

The log_offset we return on the first response is a commitment: this scan
reflects KV state as of this log position. That guarantee is what makes the
CDC bootstrap pattern correct, since you read all KV rows, then replay the
log from log_offset, and you know the two halves are consistent. A purely
stateless token approach (fresh snapshot per page) breaks this guarantee.
You cannot claim a log_offset that covers a scan made of pages from
different snapshots.

A hybrid — keep the snapshot alive, drop the iterator — preserves the
guarantee but still requires server-side state. You have traded the
iterator for a pinned snapshot; the ScannerManager does not go away, it
just manages slightly fewer bytes per session. The seek cost per batch is
added on top.

So Option B is not strictly simpler, it is roughly the same complexity as
Option A for the use cases that matter, with an added seek per batch and
without the live iterator's natural progress tracking.
Option C (fully stateless, no snapshot isolation) is simpler, and it is a
valid approach for bulk export or admin tooling. However I don't think it
is appropriate as the primary mode for this FIP's target use cases.

*On retries*
You are correct that a mid-scan leader failover cannot be recovered
transparently under the current design. the session is gone and the client
must restart. This is can be a limitation. But I would argue it is not a
correctness problem: it is consistent with how Flink handles source
restarts in general (replay from checkpoint). The callSeqId mechanism
handles the more common case. transient network failure with the same
leader alive — cleanly.

Improving the failover story is a good follow-up item. A resume_hint field
on the response (the last key served, opaque bytes) would let a client that
detects session loss open a new session and skip already-processed rows,
without changing the core protocol. This is purely additive and does not
require redesigning the session model.

*On the FIP needing an alternatives comparison*
This could indeed be useful, however given the time this FIP has been
opened and that the VOTE closes today and since this is a user-facing
feature for which we don't know the exact value yet, until we get it out
these for users to use, I would prefer to close the VOTE, get something out
there and iterate.

Both you and Lorenzo raised are real engineering tradeoffs, and I
appreciate them being surfaced. But I do not think any of them represent a
correctness flaw in the current design, a safety hazard, or a decision that
forecloses future evolution. The live-session model is sound for the target
use cases, the machinery is bounded and tested, and the points raised are
things we can iterate on. Adding the alternatives section to the FIP
addresses the most concrete ask.

I would like to proceed with the vote on that basis. If anyone has a
specific concern they believe is blocking like a correctness issue, a
protocol commitment that we cannot evolve away from, or an operational risk
we haven't accounted for, then please leave your VOTE as -1 and we can park
this for now.

Let me know your thoughts,
and as i mentioned proceed with the Vote since it is due today and we can
park it if you think these are indeed blocking issues.

Best,
Giannis

Reply via email to