Re: Scaling to 10 Million IMAP sessions on a single server

KT Walrus Thu, 23 Feb 2017 14:29:26 -0800

> On Feb 23, 2017, at 4:21 PM, Timo Sirainen <t...@iki.fi> wrote:
> 
> On 23 Feb 2017, at 23.00, Timo Sirainen <t...@iki.fi> wrote:
>> 
>> I mainly see such external databases as additional reasons for things to 
>> break. And even if not, additional extra layers of latency.
> 
> Oh, just thought that I should clarify this and I guess other things I said. 
> I think there are two separate things we're possibly talking about in here:
> 
> 1) Temporary state: This is what I was mainly talking about. State related to 
> a specific IMAP session. This doesn't take much space and can be stored in 
> the proxy's memory since it's specific to the TCP session anyway.


Moving the IMAP session state to the proxy so the backend can just have a fixed 
pool of worker processes is really what I think is necessary for scaling to 
millions of IMAP sessions. I still think it would be best to store this state 
in a way that you could at least “remember” the backend server that is 
implementing the IMAP session and the auth data. To me, that would be to use 
Redis for session state. Redis is a very efficient in-memory database where the 
data is persistent and replicated. And, it is popular enough to be well tested 
and easy to use (the API is very simple).

I use HAProxy for my web servers and HAProxy supports “stick” tables to map a 
client IP to the same backend server that was selected when the session was 
first established. HAProxy then supports proxy “peers” where the “stick” tables 
are shared between multiple proxies. That way, if a proxy fails, I can move the 
VIP over (or let DNS round-robin) to another proxy and still get the same 
backend (which has session state) without having the proxy pick some other 
backend (losing the backend session state). It might be fairly complex for 
HAProxy to share these “stick” tables across a cluster of proxies, but I would 
think it would be easy to use Redis to cache this data so all proxies could 
access this shared data.

I’m not sure if Dovecot proxies would benefit from “sticks and peers” for IMAP 
protocol, but it would be nice if Dovecot proxies could maintain the IMAP 
session if the connections needed to be moved to another proxy (for failover). 
Maybe it isn’t so bad if a dovecot proxy all of a sudden “kicked” 10 Million 
IMAP sessions, but this might lead to a “login” flood for the remaining 
proxies. So, at least the authorization data (the passdb queries) should be 
shared between proxies using Redis.

> 
> 2) Permanent state: This is mainly about the storage. A lot of people use 
> Dovecot with NFS. So one possibility for storing the permanent state is NFS. 
> Another possibility with Dovecot Pro is to store it to object storage as 
> blobs and keep a local cache of the state. A 3rd possibility might be to use 
> some kind of a database for storing the permanent state. I'm fine with the 
> first two, but with 3rd I see a lot of problems and not a whole lot of 
> benefit. But if you think of the databases (or even NFS) as blob storage, you 
> can think of them the same as any object storage and use the same obox format 
> with them. What I'm mainly against is attempting to create some kind of a 
> database that has structured format like (imap_uid, flags, ...) - I'm sure 
> that can be useful for various purposes but performance or scalability isn't 
> one of them.

I would separate the permanent state into two: the indexes and the message 
data. As I understand it, the indexes are the meta data about the message data. 
I believe, that to scale, the indexes need fast read access so this means 
storing on local NVMe SSD storage. But, I want the indexes to be reliably 
shared between all backend servers in a dovecot cluster. Again, this means to 
me that you need some fast in-memory database like Redis to be the “source of 
truth” for the indexes. I think doing read requests to Redis is very fast so 
you might not have to store a cache of the index on local NVMe SSD storage, but 
maybe I’m wrong.

As for the message data, I would really like the option of storing this data in 
some external database like MongoDB. MongoDB stores documents as JSON (actually 
BSON) data which seems perfect for email storage since emails are all text 
files. This would allow me to manage storage using the tools/techniques that an 
external database uses. MongoDB is designed to be hugely scalable and supports 
High Availability. I would rather manage a cluster of MongoDB instances 
containing a petabyte of data than trying to distribute the data among many 
Dovecot IMAP servers. The IMAP servers would then only be responsible for 
implementing IMAP and not be loaded down with all sorts of I/O so might be able 
to scale to 10 Million IMAP sessions per server.

If a MongoDB option wasn’t available, using cloud object storage would be a 
reasonable second choice. Unfortunately, the “obox” support you mentioned 
doesn’t seem to be open source. So, I am stuck using local disks (hopefully 
SSDs, but this is pricey) on multiple backend servers. I had reliability 
problems using NFS for a previous project and I am hesitant to try this 
solution for scaling Dovecot. Fortunately, my mailboxes are all very small 
(maybe 2MBs per user) since I delete messages older than 30 days and I store 
attachments (photos and videos) in cloud object storage served with local web 
server caching. So, scaling message data shouldn't be an issue for me for a 
long time. 

Kevin

Re: Scaling to 10 Million IMAP sessions on a single server

Reply via email to