Rich Pieri wrote on 2026-01-22 08:02:
You dropped the "arbitrary". When everything fits neatly into
tables or key/value stores then sure, a database might work. Email
messages are not neat. They are very much like medical records:
arbitrary in size and structure. Few databases can deal well with
this kind of data.
Quite a few actually *can* deal with this kind of data; it's been a
fairly mature field for a decade or more already.
You can read about databases designed to work with unstructured data at
your leisure:
https://en.wikipedia.org/wiki/NoSQL
You can read up on all of Oracle's failures in the EMR space, and
the history of WinFS, at your own pace, as examples of this.
So, two companies that are bad at software had failures, so in this
field a database is doomed to failure? Disagree.
What about Gmail? They use a database (from which RocksDB was forked)
and it undeniably works:
Google Infrastructure Supporting Gmail
Technology Purpose
MapReduce Processes large volumes of data, such as email indexing
BigTable Stores structured metadata and user preferences
https://www.dhiwise.com/post/understanding-gmail-architecture-a-
comprehensive-guide
In this case, Stalwart is attempting, with funding from the European
Commission’s Next Generation Internet programme¹ and GitHub's OSSF (Open
Source Secure Fund)², to replace reliance on Gmail / Yahoo / Outlook /
iCloud / etc.
With the goal of competing on a scale of
to host hundreds of millions of email accounts reliably? How do they
store petabytes of messages, survive hardware failures without
losing data, and keep spam at bay across billions of daily
deliveries?
the thought of billions (or millions) of Maildir files per day is laughable.
A *lot* of people use web mail, so searching those millions / billions
of messages *must* be fast.
I'm not yet sold on converting to Stalwart - it's extremely promising
and that feels uncommon these days.
But I've encountered some issues that may be its fault, or the reverse
proxy's fault, or KDE's fault, or my fault,... Undiagnosed.
And the schema for their data is all serialized key/value storage, even
in PostgreSQL. That bothers me a bit and I'll have to evaluate some more.
What these developers have created in a short time is so highly polished
and feature complete that I have nothing but respect for their abilities
and reasonable confidence in their skills & choices.
We'll see.
Their discussion / recommendations on data storage are here:
https://stalw.art/docs/install/store/
¹ https://stalw.art:8443/blog/github-ossf#about-githubs-ossf
² https://stalw.art/blog/nlnet-grant-collaboration/
_______________________________________________
Discuss mailing list
[email protected]
https://lists.blu.org/mailman/listinfo/discuss