Re: [DISCUSS] FIP-16: auto-increment column

Jark Wu Mon, 27 Oct 2025 20:03:54 -0700

Thank you Cheng,

The updates looks good to me.


Best,
Jark

On Mon, 27 Oct 2025 at 17:27, Wang Cheng <[email protected]> wrote:

> Thank you Jark for your comments.
>
>
> &gt; I think we need to support INT type for AUTO_INCREMENT column,
> because roaring bitmap32 is more commonly used because it is enough for
> most cases and cheaper than rbm64.
>
>
> I agree with this point. Roaring Bitmap32 is more prevalent due to its
> better performance. I have revised the FIP to specify that the data type of
> the AUTO_INCREMENT column must be INT or BIGINT.
>
>
> &gt; The znode looks like an index node for the table, would be better to
> renaming it to something more descriptive would improve clarity:
> /metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx]
>
>
> I have renamed the znode path according to your suggestion.
>
>
> &gt; What happens to other columns, if this only updates other columns
> except the auto-incremented column, this will be a very strange behavior
> and conflict with the upsert semantic. According the the previous
> discussion, we don't allow insert/upsert on the auto-incremented column, so
> this should throw exception directly without checking nullability of
> auto-incremented column.
>
>
> I think there is some ambiguity in the UPSERT section. The UPSERT behavior
> has been clarified as follows: "Assigning values to an AUTO_INCREMENT
> column during UPSERT operations is prohibited, irrespective of target row
> existence. If the row is newly inserted into the table, Fluss fills the
> AUTO_INCREMENT column with a new auto-incremented ID."
>
>
> &gt; I suggest not introducing the `AutoIncIDBuffer` in the first version,
> or at least do not introduce these 2 options as they become public API.
> Because, this may conflict with the persisted ID approach that we will soon
> introduce.
>
>
> I have removed those 2 options from the FIP considering that it may not be
> compatible with our upcoming bucket state snapshot feature. I have also
> revised the failover handling section accordingly: "To eliminate gaps
> during failover, local cached IDs will be persisted as part of the upcoming
> bucket state snapshot, ensuring their availability after server restart."
>
>
>
>
>
> Regards,
> Cheng
>
>
>
> &nbsp;
>
>
>
>
> ------------------ Original ------------------
> From:
>                                                   "dev"
>                                                                 <
> [email protected]&gt;;
> Date:&nbsp;Sun, Oct 26, 2025 04:33 PM
> To:&nbsp;"dev"<[email protected]&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column
>
>
>
> Hi Cheng,
>
> Sorry for interrupting the vote. However, I spotted some issues we may
> need to address.
>
> &gt; The data type of the AUTO_INCREMENT column must be BIGINT.
>
> I think we need to support INT type for AUTO_INCREMENT column, because
> roaringbitmap32 is more commonly used because it is enough for most cases
> and cheaper than rbm64.
>
> &gt; For UPSERT operations, the following situations occur: If the row
> already
> exists in the table, Fluss does not update the auto-incremented ID.
>
> What happens to other columns, if this only updates other columns except
> the auto-incremented column, this will be a very strange behavior and
> conflict with the upsert semantic.
> According the the previous discussion, we don't allow insert/upsert on the
> auto-incremented column, so this should throw exception directly without
> checking nullability of auto-incremented column.
>
> &gt; ZooKeeper with the znode path being
>
> /metadata/databases/[databaseName]/tables/[tableName]/autoinc/idx_[columnIdx]
>
> The znode looks like an index node for the table, would be better to
> renaming it to something more descriptive would improve clarity:
>
> /metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx]
>
> &gt; The prefetch batch size and low watermark ratio are controlled by
> configuration parameters table.auto_inc_cache_size and
> table.auto_inc_low_water_mark_size_ratio respectively.
>
> I suggest not introducing the `AutoIncIDBuffer` in the first version, or at
> least do not introduce these 2 options as they become public API. Because,
> this may conflict with the persisted ID approach that we will soon
> introduce.
>
>
> Best,
> Jark
>
> On Tue, 23 Sept 2025 at 10:13, Wang Cheng <[email protected]&gt;
> wrote:
>
> &gt; Hi&amp;nbsp;Giannis,
> &gt;
> &gt;
> &gt; Thanks for your comments.
> &gt;
> &gt;
> &gt; 1. That makes sense. I'll update the enableAutoIncrement() method to
> &gt; accept the column name as a parameter.
> &gt; 2. Once the local cached IDs are used up, the bucket will request a
> new
> &gt; batch from ZooKeeper.
> &gt; 3. The default cache size 100,000 is inspired by the modern OLAP
> database
> &gt; StarRocks, which should suffice for most use cases. I think we can
> add a
> &gt; note suggesting that table with high-frequency inserts should set a
> larger
> &gt; number for better performance.&amp;nbsp;
> &gt;
> &gt;
> &gt;
> &gt; Regards,
> &gt; Cheng
> &gt;
> &gt;
> &gt;
> &gt; &amp;nbsp;
> &gt;
> &gt;
> &gt;
> &gt;
> &gt; ------------------&amp;nbsp;Original&amp;nbsp;------------------
> &gt; From:
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "dev"
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <
> &gt; [email protected]&amp;gt;;
> &gt; Date:&amp;nbsp;Mon, Sep 22, 2025 10:51 PM
> &gt; To:&amp;nbsp;"dev"<[email protected]&amp;gt;;
> &gt;
> &gt; Subject:&amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment column
> &gt;
> &gt;
> &gt;
> &gt; Hi Cheng and thank you for driving this 🙏
> &gt;
> &gt; My first question in terms of the API design is also
> &gt; If the API .enableAutoIncrement() should take as argument the column
> name,
> &gt; so it’s more intuitive and clear.
> &gt;
> &gt; My extra comments are:
> &gt; 1. What happens if a bucket reaches its threshold? i.e has a key
> range [1,
> &gt; 100.000] and hits the upper bound? (If it’s mentioned and i missed it,
> &gt; please ignore my comment)
> &gt;
> &gt; 2. Based on my experience with Paimon, the record number (depending
> on the
> &gt; record size) might range between 1-10million records. In most of my
> &gt; experiments, with autoscaling buckets, i always had a 1million rows
> per
> &gt; bucket. So I’m thinking maybe it’s better to make the default
> threshold
> &gt; larger.
> &gt;
> &gt; Best,
> &gt; Giannis
> &gt;
> &gt; On Mon, 22 Sep 2025 at 3:41 PM, Wang Cheng <[email protected]
> &amp;gt;
> &gt; wrote:
> &gt;
> &gt; &amp;gt; Hi Mehul,
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; Thanks for your comments.
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; 1. When a tablet servers restarts, its in-memory local
> cached IDs are
> &gt; &amp;gt; lost. It will then invoke the add [1] method of ZooKeeper
> &gt; &amp;gt; DistributedAtomicLong to request a new batch of IDs.
> ZooKeeper
> &gt; &amp;gt; DistributedAtomicLong acts as a globally synchronized
> counter
> &gt; that only
> &gt; &amp;gt; issues monotonically increasing values. If values of
> &gt; DistributedAtomicLong
> &gt; &amp;gt; are exhausted, an error will be thrown.
> &gt; &amp;gt; 2. Yes, if the tablet server holding bucket 1 (range
> 1–100,000) fails
> &gt; &amp;gt; permanently, those cached but unused IDs are lost forever,
> creating
> &gt; gaps in
> &gt; &amp;gt; the sequence. As highlighted in the proposal under
> "monotonicity",
> &gt; Fluss
> &gt; &amp;gt; does not guarantee that the values for the AUTO_INCREMENT
> column are
> &gt; &amp;gt; strictly monotonic to prioritize performance and simplicity.
> It can
> &gt; only be
> &gt; &amp;gt; ensured that the values roughly increase in chronological
> order.
> &gt; &amp;gt; 3. In your scenario, once both requests confirm that the
> target
> &gt; primary
> &gt; &amp;gt; key does not exist, they will proceed to initiate an insert
> operation.
> &gt; &amp;gt; However, a write lock in the insertion path acts as a
> safeguard
> &gt; against
> &gt; &amp;gt; concurrent write conflicts. Crucially, after a request
> successfully
> &gt; &amp;gt; acquires the write lock, it must recheck the existence of
> the primary
> &gt; key
> &gt; &amp;gt; once more before proceeding with the actual insert. This
> two-step
> &gt; &amp;gt; verification coupled with the write lock ensures that only
> one
> &gt; request can
> &gt; &amp;gt; ultimately complete the insertion, thereby preventing the
> generation
> &gt; of
> &gt; &amp;gt; duplicate auto-increment IDs.
> &gt; &amp;gt; 4. The cache size should be tuned based on insert volume. For
> &gt; &amp;gt; high-frequency insert operations, a larger cache is
> recommended for
> &gt; optimal
> &gt; &amp;gt; performance.
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; [1]
> &gt; &amp;gt;
> &gt;
> https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)
> &gt
> <https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)&gt>;
> &amp;gt
> &gt; <
> https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)&amp;gt&gt
> ;
> &gt; ;
> &gt; &amp;gt; Regards,
> &gt; &amp;gt; Cheng
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;amp;nbsp;
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; ------------------ Original ------------------
> &gt; &amp;gt; From:
> &gt;
> &amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
> &gt; "dev"
> &gt;
> &amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
> &gt; <
> &gt; &amp;gt; [email protected]&amp;amp;gt;;
> &gt; &amp;gt; Date:&amp;amp;nbsp;Sun, Sep 21, 2025 04:55 AM
> &gt; &amp;gt; To:&amp;amp;nbsp;"dev"<[email protected]&amp;amp;gt;;
> &gt; &amp;gt;
> &gt; &amp;gt; Subject:&amp;amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment
> column
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; Hi Cheng,
> &gt; &amp;gt;
> &gt; &amp;gt; Thanks for driving this, it's a needed feature to leap
> forward making
> &gt; the
> &gt; &amp;gt; stack production ready for real-world scenarios.
> &gt; &amp;gt; Design made sense to me, I have small questions:
> &gt; &amp;gt;
> &gt; &amp;gt; - *Cache Coordination*: When a tablet server fails and its
> cached IDs
> &gt; &amp;gt; (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure
> those IDs
> &gt; are
> &gt; &amp;gt; never reused? Does it maintain a global highest allocated
> counter?
> &gt; &amp;gt; - *Cross-bucket Dependencies*: In the example, bucket 1 gets
> &gt; [1-100,000]
> &gt; &amp;gt; and bucket 2 gets [100,001-200,000]. What happens if tablet
> server
> &gt; &amp;gt; containing bucket 1 goes down permanently? Will there always
> be gaps
> &gt; in the
> &gt; &amp;gt; sequence?
> &gt; &amp;gt; - *Race Conditions*: If two Flink workers simultaneously
> lookup the
> &gt; same
> &gt; &amp;gt; non-existent primary key, could both trigger
> insertIfNotExists and
> &gt; create
> &gt; &amp;gt; duplicate auto-increment IDs? How is this prevented?
> &gt; &amp;gt; -&amp;amp;nbsp; How should users decide the right
> &gt; table.auto_inc_cache_size?
> &gt; &amp;gt; Should we
> &gt; &amp;gt; put a max cap on this to avoid overburden
> &gt; &amp;gt;
> &gt; &amp;gt; Best Regards,
> &gt; &amp;gt; Mehul Batra
> &gt; &amp;gt;
> &gt; &amp;gt; On Fri, Sep 19, 2025 at 5:24 PM Yang Wang <
> [email protected]
> &gt; &amp;amp;gt;
> &gt; &amp;gt; wrote:
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;amp;gt; Hi Cheng,
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; Thank you for driving this FIP. I think it is a
> nice and
> &gt; important
> &gt; &amp;gt; feature
> &gt; &amp;gt; &amp;amp;gt; for many real-world business scenarios, and the
> overall
> &gt; design makes
> &gt; &amp;gt; sense
> &gt; &amp;gt; &amp;amp;gt; to me. I have just one small question:
> &gt; &amp;gt; &amp;amp;gt; Regarding the client-side API design:
> &gt; &amp;gt; &amp;amp;gt; ```
> &gt; &amp;gt; &amp;amp;gt; Schema.newBuilder()
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;
> &amp;amp;nbsp; .column("uid",
> &gt; DataTypes.STRING())
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;
> &amp;amp;nbsp;
> &gt; .column("uid_int64", DataTypes.BIGINT())
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;
> &amp;amp;nbsp;
> &gt; .enableAutoIncrement()
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;
> &amp;amp;nbsp;
> &gt; .primaryKey("uid")
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;
> &amp;amp;nbsp; .build();
> &gt; &amp;gt; &amp;amp;gt; ```
> &gt; &amp;gt; &amp;amp;gt; If there is more than one column with INT or
> BIGINT type,
> &gt; which one
> &gt; &amp;gt; would
> &gt; &amp;gt; &amp;amp;gt; be the auto-increment column?
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; Best regards,
> &gt; &amp;gt; &amp;amp;gt; Yang
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; Wang Cheng <[email protected]&amp;amp;gt;
> 于2025年9月18日周四
> &gt; 22:49写道：
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Hi all,
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Auto-increment column is a
> bread-and-butter feature
> &gt; for
> &gt; &amp;gt; improving data
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; management efficiency. It is the
> bedrock of many
> &gt; features in
> &gt; &amp;gt; analytical
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; workloads—such as those in
> real-time unique visitor
> &gt; (UV) counting
> &gt; &amp;gt; &amp;amp;gt; scenarios.
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; To implement this capability, I'd
> like to propose
> &gt; FIP-16:
> &gt; &amp;gt; auto-increment
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; column [1].
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Any feedback and suggestions on
> this proposal are
> &gt; welcome!
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; [1]:
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt;
> &gt;
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
> &gt
> <https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&gt>;
> &amp;gt
> &gt; <
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;gt&gt
> ;;
> &gt; &amp;amp;gt
> &gt; &amp;gt; <
> &gt;
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;amp;gt&amp;gt
> &gt
> <https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;amp;gt&amp;gt&gt>;
> ;;
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Regards,
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Cheng
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;amp;nbsp;
> &gt; &amp;gt; &amp;amp;gt;

Re: [DISCUSS] FIP-16: auto-increment column

Reply via email to