Sounds like there's a general consensus that Fineract should (must?) move
away from using monotonically increasing / db-derived auto-incrementing
integers as primary keys for storage / retrieval / indexing because it's
already causing problems today. Is that correct? Seems like that's the
first thing worth voting on / clarifying, if there's even a doubt.

Assuming yes, with this change are we mainly serving/targeting a "typical"
Fineract use case or some huge/edge/future use case in terms of
scale/security? Are there any extant numbers to describe "typical", "large
scale", "edge case" Fineract deployments? If not, how hard is it to get
some useful data on this?

How important is the rough time sort-ability aspect of primary keys if
snowflake IDs are used? Sounds like a "nice to have"... surely we
already/also have timestamps for records.

How likely is a collision with snowflake IDs? If collisions are somehow
likely, how hard is it to handle them?

How big an issue is clock skew/jump in practice for snowflake IDs? Victor's
right here that sysadmins generally need all clocks in sync for all
machines (bare metal or VMs) for many reasons (e.g. accurately timestamped
log messages), so it would be useful to know what the actual tolerance is.
I have roughly the same question for machine ID... 2^10 (1,024) possible
values seems like plenty. If our target is, say "clusters of one to five
machines", this is a non-issue (1-5 machines is just a silly guess -- need
real data). Also if we don't care about data center / worker ID (and why
would we?) perhaps we can come up with something more useful for this part
of the snowflake IDs. I say this realizing it could be a deep comp sci
rabbit hole, but that might just be where we're at with this.

Is there some useful way we can describe actual in-practice scaling needs
for Fineract deployments? Actual memory, network, and on-disk storage needs
and capacities would surely be helpful in deciding if 64 bits vs. 128 bits
is consequential.

Paul, thank you for sharing a ton of past experience / lessons learned! I
confess I don't understand it all and how relevant it is, but I'd like to.

After skimming the discussion so far I'm leaning towards snowflake IDs
because we don't want to target "perfection" and eschew "good enough", but
I feel like I may be missing something Paul had to deal with that would end
up being a problem for Fineract. I also have this gut feeling that UUIDs
would be just somehow safer in the end because I'm having trouble finding
useful academic research/case studies on snowflake IDs. Still, I'm +1
snowflake IDs given the relative ease of migration and I'm not convinced of
the drawbacks (again, not knowing actual Fineract use cases).

Maybe we should do a video call? I'd love to learn more. Fascinating
discussion, folks!

--
Adam Monsen
Software Engineer ~ Mifos Initiative
Apache Fineract Release Manager
PGP key id 0xA9A14F22F57DA182

Reply via email to