On Fri, Jul 15, 2005 at 02:02:27PM -0400, Geo Carncross wrote:
> Won't work. As much as it seems like this would be a good idea (and
> believe me: about half a dozen people on this list have had it, so
> it certainly is a good idea. better still, don't believe me, check
> the archive yourself :) )
Thanks. I've now gone through a bunch of archived messages looking to
understand this.
Suggestion in short: why not keep track of which messages have been
seen by each server, and if a server senses a potential issue
(ie. after network problems are fixed, or whatever) correct it then?
[More detail later in the message.]
As I understand it, the requirements are like so:
(R0) Dropping email is not acceptable.
(R1) Follow RFC: UIDs must be 32-bit values.
(R2) Follow RFC: UIDs must monotonically increase. In particular, a
replication that results in messages with UIDs that precede a UID
value reported to a user is bad, and must be corrected.
(R3) Each server in a replicating cluster should be read-write,
ie. multimaster updates should be allowed.
(R4) Multimaster should work gracefully in failure scenario where
client and internet can reach each server, but servers cannot
reach each other.
Assumptions:
(A0) Duplicating email (ie. causing the user to see it a second time)
isn't so bad if it's infrequent.
(A1) Users start new IMAP session relatively infrequently -- a new
connection is not established within seconds of a previous
connection.
(A2) Users should have an affinity to a particular server, and should
only switch/be switched to another server in the event of a
failure.
(A3) The most likely mode of failure is that a server is unreachable
by a user. This is the scenario that should be most engineered
to not duplicate mail.
(A4) Loss of connectivity between two or more mail servers while
both/all servers are still visible to the same users is a rare
occurence. Mail server should meet minimal requirements (ie. not
drop mail) but some duplicate mail in this case is acceptable.
This can be mitigated by giving users an affinity for a server.
Suggestion in detail:
(S0) Each server in a multimaster cluster is assigned a unique
server_id. If multimaster isn't necessary, server_id for all
servers is 0.
(S1) Locally generate UIDs in a way that is globally unique
(ie. splitting message sequence count using a local_sequence *
num_servers + server_id type scheme, or some similar method.)
(S2) Each server keeps a replicated table "high_saved" of the last
locally-generated UID it's saved. Index by mailbox and
server_id.
(S3) Each server keeps a replicated table "high_reported" of the last
UID it's reported to the client. Index by mailbox and server_id.
(S4) Each server keeps a replicated table, "process_message_UID", that
is basically a message to each other server to make sure the UID
is OK. Index by mailbox, remote server_id, UID.
(S5) Each time an email arrives at a server (via SMTP or IMAP, not via
replication), the server generates a new UID using scheme from
part (S1) that is greater than any value currently in high_saved,
for any server. Then, for each server_id other than itself, it
creates a row in process_message_queue. Then, update high_saved
with the new UID.
(S6) Periodically, as a maintenance thread, each server checks
process_message_UIDs for any message sent to its server_id.
LOOP foreach message: if the message UID is lower than the last
reported UID known for this server and mailbox, change the UID of
the email to a new UID as per (S1) and (S5), and delete message
from process_message_UIDs. If greater than or equal, delete
message from process_message_UIDs without taking an action.
(S7) When the user client connects and wants the last UID, first
perform step (S6).
When done processing all messages in the process_message_UID
queue for that server_id, report new high UID to user and update
high_reported.
Examples/scenarios/analysis:
(E1) Single server, or multiple servers with a single master. Step
(S1) degenerates into a simple sequence. Step (S5) does the
same. Steps (S6) and (S7) are basically skipped, since there are
no other servers to exchange messages with, so the loops are
empty. So the server does no additional heavy lifting.
(E2) Multimaster load sharing, communication OK between servers: all
messages assigned UIDs uniquely. In general, UIDs will increase,
but under some race conditions, a server will perceive a UID to
step back due to replication. If no client actually asked about
UIDs during the race condition, no action is taken. If a user
timed things "just right", so that message with ID N arrives on
server A and message with ID N+1 arrives on server B, and user
queries B, gets N+1, then email is replicated to B, and the
replication spreads the news that A has a message UID N for B. B
should auto-sense the problem (the next time the user queries B,
or the next time B does its maintenance check) and B should
update the UID to something beyond the current known max. So, if
the second client query is to B, the client will automatically
correct. If the second client query is to A, the server will
initially give an old UID, but then B will correct it. If the
configuration is such that clients prefer their last server or a
certain server, the second client query is more likely to go to
server B, which is better.
Note #1: User will sometimes seems to have a duplicate email.
Since duplicate email is more acceptable than lost email, this
should be acceptable in most environments.
Note #2: if clients restart sessions very often in this scenario,
it's possible to have thrashing. But under normal conditions,
ie. where new sessions are relatively rare, this should be a
relatively rare occurrence.
(E3) Multimaster, connection breaks, user can only reach one server:
until the connection breaks, communication is the same as in
scenario (E2). Once the connections breaks, each server is
generating UIDs locally without being aware that the other server
is assigning them as well. If the user can only connect to one
server (ie. user is at a WAN site, WAN site has local server "A",
WAN connection is down) the user can continue to send and receive
mail using the local server. Server "A" will update
high_reported appropriately. Remote server "B" may continue to
receive mail for the user, but high_reported will not be updated.
When connectivity is restored, so long as user continues (for the
short term) to use the same server, no duplicate email should
result.
(E4) Multimaster, connection between servers breaks, user can reach
both: until the connection breaks, communication is the same as
in scenario (E2). If the user communicates with both servers,
each server will independently increase reported UID. When
connection is reestablished, one or both servers will reasign
UIDs to the other's email, resulting in apparently duplicate
email. Gotta break some eggs. Can be mitigated if user has
affinity for last mail server.
(E5) When connectivity is broken between servers and then is
reestablished, there will be a while when the server is both
catching up and receiving new email and IMAP connections. There
is potential here for duplicate email. Can be mitigated if user
has affinity for last mail server.
(E6) The process_message_UID table will be a choke point if you have a
lot of servers. This scheme is good for high availability, bad
for scalability.
OK, I probably spent way too long thinking this through and working
scenarios. Did I miss anything?
- Morty