Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Aleksandar Vidakovic Fri, 05 Sep 2025 18:51:50 -0700

+1 for UUIDv7

On Fri, Sep 5, 2025 at 11:11 PM James Dailey <[email protected]> wrote:


> Adam and all
>
> I’m convinced this is worthy of a write up as an FSIP (Fineract
> SIGNIFICANT Improvement Project).  We added this process knowing that it is
> sometimes important to hold a VOTE and do additional discussion.
>
> I am +1 on the discussion so far as well.
>
> As to the approach, I agree w Tom’s comment.  To resolve such ambiguity it
> should be done in a POC to provide additional assurance. Is it reasonable
> to start with that?
>
> I still haven’t heard anyone identify a downstream impact to smaller
> installations and third party apps, so assuming this has been looked at.
> It may be useful to write this up and provide a link in the usual places
> such that those parties who don’t monitor this may see the change coming.
>  (Again, another use for the FSIP.).
>
> Thanks
>
> Sent from Gmail Mobile
>
>
> On Fri, Sep 5, 2025 at 11:31 AM Tom Daly <[email protected]> wrote:
>
>> Hi Victor, Adam et al,
>>
>> There seems to be some general acknowledgment about the risks of handling
>> ID generation at the app level rather than the DB level again particularly
>> for clusters but I am also I suppose hearing that it is necessary.   I know
>> little about the deployment cases and operational challenges for real
>> customers so I have to defer to you Victor and others who say this is
>> necessary.
>>
>> Still, to my second question then, which is related to "risk" and that is
>> how is any new ID generation scheme going to be tested under load and made
>> bullet proof ?   What load generation tools exist for testing under load in
>> both clustered and non-clustered scenarios to make sure that this works
>> because (to say the least) it is not trivial.
>>
>> regards
>> Tom
>>
>>
>>
>>
>> On 5 Sep 2025, at 9:25 am, VICTOR MANUEL ROMERO RODRIGUEZ <
>> [email protected]> wrote:
>>
>> Hi Adam,
>>
>> I prefer to go for this approach.
>>
>>
>>    -
>>
>>    *Snowflake BIGINT* → no migration, performance-friendly, but requires
>>    care around machine ID/clock handling.
>>
>> The machine Id and clock handling are items, that I consider, are already
>> managed in production environments. And if the system is already connected
>> to a payment system, the IT admins are already aware or are more familiar
>> about having clock synchronizarion while working with distributed systems
>> connected to Apache Fineract.
>>
>> Regards
>>
>> Victor
>>
>>
>> El El vie, 5 de sep de 2025 a la(s) 2:03 a.m., Ádám Sághy <
>> [email protected]> escribió:
>>
>>> Hi Paul, Tom, James, Victor,
>>>
>>> Thanks again for sharing your experience!
>>>
>>> I see why UUIDv7 can be attractive: time-ordered, globally unique, and
>>> avoids the pitfalls of machine IDs and clock sync.
>>>
>>> That said, one major consideration for us is migration effort.
>>>
>>>    -
>>>
>>>    *UUIDv7* would require a *full schema and data migration*, touching
>>>    every table, index, and foreign key. For a system like Fineract this is a
>>>    massive undertaking, realistically close to a year of work before things
>>>    are stable, with high risk and ongoing maintenance cost for projects that
>>>    are using Fineract for a while or using it with massive amount of data.
>>>    -
>>>
>>>    *Snowflake-style BIGINTs*, on the other hand, can be introduced
>>>    without changing the existing PK data type. We can keep the schema as-is,
>>>    avoid re-indexing terabytes of data, and stay compatible with all current
>>>    integrations.
>>>
>>> Performance is the other angle: UUIDv7 doubles index size (128-bit vs
>>> 64-bit) and tends to fragment B-tree indexes more than sequential or
>>> roughly sequential IDs. Snowflake IDs remain compact BIGINTs, time-ordered
>>> enough for efficient indexing, and guarantee immediate availability in
>>> memory without requiring a flush.
>>>
>>> There’s another very important angle I’d like to add to this discussion:
>>> our current *PK generation strategy*. Today Fineract relies on IDENTITY 
>>> columns,
>>> which means the database only provides an ID during flush/commit. The
>>> backend cannot obtain an ID before that point. Unfortunately, the platform
>>> depends heavily on having an ID *earlier* in the lifecycle for things
>>> like:
>>>
>>>    -
>>>
>>>    Creating and sending business events with the new entity ID
>>>    -
>>>
>>>    Populating result objects with the resource ID immediately after
>>>    creation
>>>    -
>>>
>>>    Enabling true batch write operations (currently blocked)
>>>    -
>>>
>>>    Avoiding unnecessary flushes purely to retrieve generated IDs
>>>
>>> This limitation has significant impact on performance, testability, and
>>> developer experience.
>>>
>>> That’s why I’ve been leaning toward application-level ID generation
>>> (Snowflake-style BIGINTs). They solve the “flush dependency” problem while
>>> retaining our current schema and data type, so *no massive migration* is
>>> required. UUIDv7 would address uniqueness and ordering, but it would mean
>>> replacing all PK columns, indexes, and FKs — realistically a year-long,
>>> high-risk migration for a project of this size.
>>>
>>> Of course, Snowflake IDs aren’t perfect — after ~50 years we’d
>>> eventually hit their limits, but realistically no system of this kind goes
>>> half a century without a redesign anyway.
>>>
>>> So to me the trade-off looks like:
>>>
>>>    -
>>>
>>>    *UUIDv7* → simpler conceptually, but very heavy migration + larger
>>>    index/storage footprint.
>>>    -
>>>
>>>    *Snowflake BIGINT* → no migration, performance-friendly, but
>>>    requires care around machine ID/clock handling.
>>>
>>>
>>> *At the end the question is whether we want to fix 1 problem with a
>>> short implementation timeline, or fix 2 problems with a pretty lengthy
>>> implementation timeline and with the risk of a very complex data migration!*
>>>
>>>
>>> *As an ending note, I really love this conversation and all the ideas
>>> shared — that was exactly my intention in starting the thread. Please keep
>>> it going; we should consider all angles and hear all voices in order to
>>> select the best option for the community and the project!*
>>>
>>> Best,
>>> Adam
>>>
>>> On 2025. Sep 4., at 13:36, Paul <[email protected]> wrote:
>>>
>>> Hi Adam,
>>> I'm not super technical, but have faced similar issues in the past and
>>> will try to regurgitate what we discovered as a "better way" which might be
>>> useful.
>>>
>>> First, let me share using the Machine ID in the sequence ends up being
>>> an ugly pain in the ass point and was a near catastrophic issue for us
>>> . . . so while your solution would be an improvement, it creates new risks
>>> which are just as onerous. That was on metal servers. With cloud services,
>>> scaling and churn would seem to multiply the risk? While I'm sure there are
>>> ways around each challenge, again, each "way around" complicates the
>>> solution and worse, increases time, cost and cognitive load.
>>>
>>> I going to suggest (PLEASE double check my sanity) 128-bit IDs using
>>> time-ordered UUIDs v7. This provides global uniqueness with the only
>>> negative being a slight increase in index size (16 vs 8 bit). It should
>>> avoid several "what ifs" on clock issues (jumps and fall backs) and risk of
>>> using Machin ID, you will discover a LOT of extra effort is required to
>>> make those work. THEN after ALL that, will find that uniqueness is NOT
>>> certain. I found this out the hard way. In short, we tried what sounded
>>> like your method and failed to enjoy a truly scalable solution. In fact, we
>>> made 3 major efforts to get it to "play nice" and failed.
>>>
>>> This simplified method provides immediate ops benefit, is a low risk
>>> implementation. (Lead Developer, suggested it was zero risk, but I don't
>>> beleive in such a thing)  We were working with Postgres, Java app, NOT
>>> Fineract, but assuming its generally applicable.
>>>
>>> Assumptions:
>>> 1) You're not trying to remove or replace primary keys and will keep
>>> them for internal use
>>> 2) Postgres handles indexing of uuid auto-magically; as long as the
>>> column is marked unique
>>> 3) Use uuid as the public ID in APIs
>>>
>>> If you think this is helpful or a better option, feel free to quiz me.
>>>
>>> Regards
>>> Paul
>>>
>>>
>>>
>>

Re: PROPOSAL: Moving away from DB-enforced auto-increment primary keys

Reply via email to