Databases store rows as key-value pairs keyed by the primary key. While
UUIDs or Snowflake IDs can marginally improve primary-key lookup times via
O(log n) algorithms, this benefits at most 10% of queries. The remaining
90% of searches—those performed on columns or fields—still require scanning
O(n) rows, where such an approach offers no improvement. In practical
terms, adopting UUIDs or Snowflake IDs does not yield measurable
performance gains for the majority of operations and risks creating the
impression of optimization without substantive effect.

On Thu, Sep 4, 2025 at 2:26 PM Ádám Sághy <[email protected]> wrote:

> Dear Fineract community,
>
> I’d like to open a discussion on our current use of auto-increment
> database primary keys (e.g. BIGINT IDENTITY) and whether it’s time to
> evolve this design.
>
> *Current Issues with Auto-Increment PKs*
>
>    - Always requires flush/commit: The database generates the ID only
>    once the transaction is flushed or committed, which makes it difficult to
>    work with new entities before persistence. This complicates batching,
>    caching, and testability.
>    - Performance implications: Because IDs are only known after flush,
>    code often performs additional flushes just to obtain generated IDs. This
>    introduces unnecessary round-trips and slows down persistence at scale.
>    Also this cause various issues around having multiple flushes per
>    transactions.
>    - Predictability: Sequential IDs are trivially guessable. When exposed
>    via APIs, this can pose a security and privacy concern.
>    - Operational flexibility: While a single database removes the need
>    for distributed coordination, relying solely on DB-generated sequences ties
>    identity generation to persistence, which reduces flexibility when
>    considering service decomposition, replication, or multi-tenant setups.
>
>
> *Recommendation: Snowflake-style BIGINT IDs*
>
> Application-level ID generation using a Snowflake-style algorithm (64-bit
> IDs with timestamp + machine ID + sequence).
>
> Pros:
>
>    - IDs are available immediately in memory, no flush required.
>    - Globally unique across instances.
>    - Roughly time-ordered, which helps with debugging and querying.
>    - Removes guessability of plain sequential IDs.
>    - Still compact BIGINT for indexing and joins.
>    - *No need for migration!*
>
> Cons:
>
>    - Requires a generator implementation and clock synchronization
>    safeguards.
>    - IDs are less human-friendly than sequential integers.
>
>
>
> Looking forward to hear your thoughts on this proposal!
>
>
> Regards,
>
> Adam
>
>
>

Reply via email to