Re: [DISCUSS] FIP-16: auto-increment column

Wang Cheng Mon, 27 Oct 2025 02:27:38 -0700

Thank you Jark for your comments.


&gt; I think we need to support INT type for AUTO_INCREMENT column, because 
roaring bitmap32 is more commonly used because it is enough for most cases and 
cheaper than rbm64.


I agree with this point. Roaring Bitmap32 is more prevalent due to its better 
performance. I have revised the FIP to specify that the data type of the 
AUTO_INCREMENT column must be INT or BIGINT.


&gt; The znode looks like an index node for the table, would be better to 
renaming it to something more descriptive would improve clarity: 
/metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx]


I have renamed the znode path according to your suggestion.


&gt; What happens to other columns, if this only updates other columns except 
the auto-incremented column, this will be a very strange behavior and conflict 
with the upsert semantic. According the the previous discussion, we don't allow 
insert/upsert on the auto-incremented column, so this should throw exception 
directly without checking nullability of auto-incremented column.


I think there is some ambiguity in the UPSERT section. The UPSERT behavior has 
been clarified as follows: "Assigning values to an AUTO_INCREMENT column during 
UPSERT operations is prohibited, irrespective of target row existence. If the 
row is newly inserted into the table, Fluss fills the AUTO_INCREMENT column 
with a new auto-incremented ID."


&gt; I suggest not introducing the `AutoIncIDBuffer` in the first version, or 
at least do not introduce these 2 options as they become public API. Because, 
this may conflict with the persisted ID approach that we will soon introduce.


I have removed those 2 options from the FIP considering that it may not be 
compatible with our upcoming bucket state snapshot feature. I have also revised 
the failover handling section accordingly: "To eliminate gaps during failover, 
local cached IDs will be persisted as part of the upcoming bucket state 
snapshot, ensuring their availability after server restart."





Regards,
Cheng



&nbsp;




------------------ Original ------------------
From:                                                                           
                                             "dev"                              
                                                      <[email protected]&gt;;
Date:&nbsp;Sun, Oct 26, 2025 04:33 PM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column



Hi Cheng,

Sorry for interrupting the vote. However, I spotted some issues we may
need to address.

&gt; The data type of the AUTO_INCREMENT column must be BIGINT.

I think we need to support INT type for AUTO_INCREMENT column, because
roaringbitmap32 is more commonly used because it is enough for most cases
and cheaper than rbm64.

&gt; For UPSERT operations, the following situations occur: If the row already
exists in the table, Fluss does not update the auto-incremented ID.

What happens to other columns, if this only updates other columns except
the auto-incremented column, this will be a very strange behavior and
conflict with the upsert semantic.
According the the previous discussion, we don't allow insert/upsert on the
auto-incremented column, so this should throw exception directly without
checking nullability of auto-incremented column.

&gt; ZooKeeper with the znode path being
/metadata/databases/[databaseName]/tables/[tableName]/autoinc/idx_[columnIdx]

The znode looks like an index node for the table, would be better to
renaming it to something more descriptive would improve clarity:
/metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx]

&gt; The prefetch batch size and low watermark ratio are controlled by
configuration parameters table.auto_inc_cache_size and
table.auto_inc_low_water_mark_size_ratio respectively.

I suggest not introducing the `AutoIncIDBuffer` in the first version, or at
least do not introduce these 2 options as they become public API. Because,
this may conflict with the persisted ID approach that we will soon
introduce.


Best,
Jark

On Tue, 23 Sept 2025 at 10:13, Wang Cheng <[email protected]&gt; wrote:

&gt; Hi&amp;nbsp;Giannis,
&gt;
&gt;
&gt; Thanks for your comments.
&gt;
&gt;
&gt; 1. That makes sense. I'll update the enableAutoIncrement() method to
&gt; accept the column name as a parameter.
&gt; 2. Once the local cached IDs are used up, the bucket will request a new
&gt; batch from ZooKeeper.
&gt; 3. The default cache size 100,000 is inspired by the modern OLAP database
&gt; StarRocks, which should suffice for most use cases. I think we can add a
&gt; note suggesting that table with high-frequency inserts should set a larger
&gt; number for better performance.&amp;nbsp;
&gt;
&gt;
&gt;
&gt; Regards,
&gt; Cheng
&gt;
&gt;
&gt;
&gt; &amp;nbsp;
&gt;
&gt;
&gt;
&gt;
&gt; ------------------&amp;nbsp;Original&amp;nbsp;------------------
&gt; From:
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "dev"
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <
&gt; [email protected]&amp;gt;;
&gt; Date:&amp;nbsp;Mon, Sep 22, 2025 10:51 PM
&gt; To:&amp;nbsp;"dev"<[email protected]&amp;gt;;
&gt;
&gt; Subject:&amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment column
&gt;
&gt;
&gt;
&gt; Hi Cheng and thank you for driving this ?0?5
&gt;
&gt; My first question in terms of the API design is also
&gt; If the API .enableAutoIncrement() should take as argument the column name,
&gt; so it??s more intuitive and clear.
&gt;
&gt; My extra comments are:
&gt; 1. What happens if a bucket reaches its threshold? i.e has a key range [1,
&gt; 100.000] and hits the upper bound? (If it??s mentioned and i missed it,
&gt; please ignore my comment)
&gt;
&gt; 2. Based on my experience with Paimon, the record number (depending on the
&gt; record size) might range between 1-10million records. In most of my
&gt; experiments, with autoscaling buckets, i always had a 1million rows per
&gt; bucket. So I??m thinking maybe it??s better to make the default threshold
&gt; larger.
&gt;
&gt; Best,
&gt; Giannis
&gt;
&gt; On Mon, 22 Sep 2025 at 3:41?6?2PM, Wang Cheng 
<[email protected]&amp;gt;
&gt; wrote:
&gt;
&gt; &amp;gt; Hi Mehul,
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; Thanks for your comments.
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; 1. When a tablet servers restarts, its in-memory local cached IDs 
are
&gt; &amp;gt; lost. It will then invoke the add [1] method of ZooKeeper
&gt; &amp;gt; DistributedAtomicLong to request a new batch of IDs. ZooKeeper
&gt; &amp;gt; DistributedAtomicLong acts as a ?6?7?6?7globally synchronized 
counter?6?7?6?7
&gt; that only
&gt; &amp;gt; issues monotonically increasing values. If values of
&gt; DistributedAtomicLong
&gt; &amp;gt; are exhausted, an error will be thrown.
&gt; &amp;gt; 2. Yes, if the tablet server holding bucket 1 (range 1?C100,000) 
fails
&gt; &amp;gt; permanently, those cached but unused IDs are lost forever, 
creating
&gt; gaps in
&gt; &amp;gt; the sequence. As highlighted in the proposal under "monotonicity",
&gt; Fluss
&gt; &amp;gt; does not guarantee that the values for the AUTO_INCREMENT column 
are
&gt; &amp;gt; strictly monotonic to prioritize performance and simplicity. It 
can
&gt; only be
&gt; &amp;gt; ensured that the values roughly increase in chronological order.
&gt; &amp;gt; 3. In your scenario, once both requests confirm that the target
&gt; primary
&gt; &amp;gt; key does not exist, they will proceed to initiate an insert 
operation.
&gt; &amp;gt; However, a write lock in the insertion path acts as a safeguard
&gt; against
&gt; &amp;gt; concurrent write conflicts. Crucially, after a request 
successfully
&gt; &amp;gt; acquires the write lock, it must recheck the existence of the 
primary
&gt; key
&gt; &amp;gt; once more before proceeding with the actual insert. This two-step
&gt; &amp;gt; verification coupled with the write lock ensures that only one
&gt; request can
&gt; &amp;gt; ultimately complete the insertion, thereby preventing the 
generation
&gt; of
&gt; &amp;gt; duplicate auto-increment IDs.
&gt; &amp;gt; 4. The cache size should be tuned based on insert volume. For
&gt; &amp;gt; high-frequency insert operations, a larger cache is recommended 
for
&gt; optimal
&gt; &amp;gt; performance.
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; [1]
&gt; &amp;gt;
&gt; 
https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)
&gt; &amp;gt
&gt; 
<https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)&amp;gt&gt;
&gt; ;
&gt; &amp;gt; Regards,
&gt; &amp;gt; Cheng
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; &amp;amp;nbsp;
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; ------------------ Original ------------------
&gt; &amp;gt; From:
&gt; 
&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&gt; "dev"
&gt; 
&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&gt; <
&gt; &amp;gt; [email protected]&amp;amp;gt;;
&gt; &amp;gt; Date:&amp;amp;nbsp;Sun, Sep 21, 2025 04:55 AM
&gt; &amp;gt; To:&amp;amp;nbsp;"dev"<[email protected]&amp;amp;gt;;
&gt; &amp;gt;
&gt; &amp;gt; Subject:&amp;amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment column
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt;
&gt; &amp;gt; Hi Cheng,
&gt; &amp;gt;
&gt; &amp;gt; Thanks for driving this, it's a needed feature to leap forward 
making
&gt; the
&gt; &amp;gt; stack production ready for real-world scenarios.
&gt; &amp;gt; Design made sense to me, I have small questions:
&gt; &amp;gt;
&gt; &amp;gt; - *Cache Coordination*: When a tablet server fails and its cached 
IDs
&gt; &amp;gt; (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure those 
IDs
&gt; are
&gt; &amp;gt; never reused? Does it maintain a global highest allocated counter?
&gt; &amp;gt; - *Cross-bucket Dependencies*: In the example, bucket 1 gets
&gt; [1-100,000]
&gt; &amp;gt; and bucket 2 gets [100,001-200,000]. What happens if tablet server
&gt; &amp;gt; containing bucket 1 goes down permanently? Will there always be 
gaps
&gt; in the
&gt; &amp;gt; sequence?
&gt; &amp;gt; - *Race Conditions*: If two Flink workers simultaneously lookup 
the
&gt; same
&gt; &amp;gt; non-existent primary key, could both trigger insertIfNotExists and
&gt; create
&gt; &amp;gt; duplicate auto-increment IDs? How is this prevented?
&gt; &amp;gt; -&amp;amp;nbsp; How should users decide the right
&gt; table.auto_inc_cache_size?
&gt; &amp;gt; Should we
&gt; &amp;gt; put a max cap on this to avoid overburden
&gt; &amp;gt;
&gt; &amp;gt; Best Regards,
&gt; &amp;gt; Mehul Batra
&gt; &amp;gt;
&gt; &amp;gt; On Fri, Sep 19, 2025 at 5:24?6?2PM Yang Wang 
<[email protected]
&gt; &amp;amp;gt;
&gt; &amp;gt; wrote:
&gt; &amp;gt;
&gt; &amp;gt; &amp;amp;gt; Hi Cheng,
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; Thank you for driving this FIP. I think it is a nice 
and
&gt; important
&gt; &amp;gt; feature
&gt; &amp;gt; &amp;amp;gt; for many real-world business scenarios, and the 
overall
&gt; design makes
&gt; &amp;gt; sense
&gt; &amp;gt; &amp;amp;gt; to me. I have just one small question:
&gt; &amp;gt; &amp;amp;gt; Regarding the client-side API design:
&gt; &amp;gt; &amp;amp;gt; ```
&gt; &amp;gt; &amp;amp;gt; Schema.newBuilder()
&gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; 
&amp;amp;nbsp; .column("uid",
&gt; DataTypes.STRING())
&gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; 
&amp;amp;nbsp;
&gt; .column("uid_int64", DataTypes.BIGINT())
&gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; 
&amp;amp;nbsp;
&gt; .enableAutoIncrement()
&gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; 
&amp;amp;nbsp;
&gt; .primaryKey("uid")
&gt; &amp;gt; &amp;amp;gt; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; 
&amp;amp;nbsp; .build();
&gt; &amp;gt; &amp;amp;gt; ```
&gt; &amp;gt; &amp;amp;gt; If there is more than one column with INT or BIGINT 
type,
&gt; which one
&gt; &amp;gt; would
&gt; &amp;gt; &amp;amp;gt; be the auto-increment column?
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; Best regards,
&gt; &amp;gt; &amp;amp;gt; Yang
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; Wang Cheng <[email protected]&amp;amp;gt; 
??2025??9??18??????
&gt; 22:49??????
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Hi all,
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Auto-increment column is a 
bread-and-butter feature
&gt; for
&gt; &amp;gt; improving data
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; management efficiency. It is the 
bedrock of many
&gt; features in
&gt; &amp;gt; analytical
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; workloads??such as those in real-time 
unique visitor
&gt; (UV) counting
&gt; &amp;gt; &amp;amp;gt; scenarios.
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; To implement this capability, I'd like 
to propose
&gt; FIP-16:
&gt; &amp;gt; auto-increment
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; column [1].
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Any feedback and suggestions on this 
proposal are
&gt; welcome!
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; [1]:
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt;
&gt; 
https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
&gt; &amp;gt
&gt; 
<https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;gt&gt;;
&gt; &amp;amp;gt
&gt; &amp;gt; <
&gt; 
https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;amp;gt&amp;gt
&gt; ;;
&gt; &amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Regards,
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; Cheng
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt;
&gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;amp;nbsp;
&gt; &amp;gt; &amp;amp;gt;

Re: [DISCUSS] FIP-16: auto-increment column

Reply via email to