Sounds like there's a general consensus that Fineract should (must?) move away from using monotonically increasing / db-derived auto-incrementing integers as primary keys for storage / retrieval / indexing because it's already causing problems today. Is that correct? Seems like that's the first thing worth voting on / clarifying, if there's even a doubt.
Assuming yes, with this change are we mainly serving/targeting a "typical" Fineract use case or some huge/edge/future use case in terms of scale/security? Are there any extant numbers to describe "typical", "large scale", "edge case" Fineract deployments? If not, how hard is it to get some useful data on this? How important is the rough time sort-ability aspect of primary keys if snowflake IDs are used? Sounds like a "nice to have"... surely we already/also have timestamps for records. How likely is a collision with snowflake IDs? If collisions are somehow likely, how hard is it to handle them? How big an issue is clock skew/jump in practice for snowflake IDs? Victor's right here that sysadmins generally need all clocks in sync for all machines (bare metal or VMs) for many reasons (e.g. accurately timestamped log messages), so it would be useful to know what the actual tolerance is. I have roughly the same question for machine ID... 2^10 (1,024) possible values seems like plenty. If our target is, say "clusters of one to five machines", this is a non-issue (1-5 machines is just a silly guess -- need real data). Also if we don't care about data center / worker ID (and why would we?) perhaps we can come up with something more useful for this part of the snowflake IDs. I say this realizing it could be a deep comp sci rabbit hole, but that might just be where we're at with this. Is there some useful way we can describe actual in-practice scaling needs for Fineract deployments? Actual memory, network, and on-disk storage needs and capacities would surely be helpful in deciding if 64 bits vs. 128 bits is consequential. Paul, thank you for sharing a ton of past experience / lessons learned! I confess I don't understand it all and how relevant it is, but I'd like to. After skimming the discussion so far I'm leaning towards snowflake IDs because we don't want to target "perfection" and eschew "good enough", but I feel like I may be missing something Paul had to deal with that would end up being a problem for Fineract. I also have this gut feeling that UUIDs would be just somehow safer in the end because I'm having trouble finding useful academic research/case studies on snowflake IDs. Still, I'm +1 snowflake IDs given the relative ease of migration and I'm not convinced of the drawbacks (again, not knowing actual Fineract use cases). Maybe we should do a video call? I'd love to learn more. Fascinating discussion, folks! -- Adam Monsen Software Engineer ~ Mifos Initiative Apache Fineract Release Manager PGP key id 0xA9A14F22F57DA182
