On Fri, Jul 22, 2005 at 10:46:50AM -0400, Geo Carncross wrote:

> > (A2) Users should have an affinity to a particular server, and should
> >      only switch/be switched to another server in the event of a
> >      failure.
> 
> Incorrect. Many people want load-balancing. High-availability access to
> reading emails. They must be willing to accept delays of new messages
> when the cluster is damaged.

The load balancers I've seen have session affinity as a basic feature,
although not necessarily enabled by default.  Ie. with affinity
enabled, once userA has used serverX, the load balancer will try to
keep on sending userA to serverX.  Meanwhile, once userB has used
serverY, the load balancer will try to keep userB to serverY.  You
still get load balancing, just your users have an affinity for a
particular server.  Affinity isn't necessary for the design to work,
but it lessens the probability of duplicate email.

> Won't work. A client that performs the following operations will lose
> email:
> 
> * Client (C) connects to host (A) sees uidvalidity mismatch; gets 1,2,4
> * C connects to host (B) sees exists "4", tries to fetch uid 5, fails.
> 
> The problem is that "B" really has uids "1,2,3,4" but the client saw a
> "gap" at "3" and will never download "3". RFC2060 _recommends_ clients
> perform this optimization, and many clients do.

The design actually ensures, in this exact scenario, that the client
does see the mail.  What happens is that when B creates email UID 3, B
sends A a message saying "I made an email with UID 3".  When A
receives the message after telling the user that the last UID is 4,
and since 4 is greater than 3, so A will MODIFY the UID in the
database from 3 to 5.  That change in UID will replicate back to B.
After that correction, the next time the user connects to either
server, the user will pick up the new email.  Effectively, you have
strictly increasing UIDs.  See?

The reason I keep worrying about duplicates is that you have a
potential problem where email with UID 3 arrives at B, and UID 4
arrives at A.  If replication is delayed, the user could connect to B
and read 3, and then connect to A and read UID 4.  When replication is
fixed, A receives a message about UID 3, realizes it didn't have it at
the time the user read UID 4, and updates the UID of 3 from 3 to 5.
The user connects again, and is told that there is a new message --
UID 5 -- which is really UID 3 all over again.  This can be mitigated
by running load balancers with user affinities.  So long as the user
stays at one server, the user will always receive ALL mail, and will
never see duplicates.  If that server goes down and the user switches
to another server, duplicates can result.

> > (E3) Multimaster, connection breaks, user can only reach one server:
> >      until the connection breaks, communication is the same as in
> >      scenario (E2).  Once the connections breaks, each server is
> >      generating UIDs locally without being aware that the other server
> >      is assigning them as well.  If the user can only connect to one
> >      server (ie. user is at a WAN site, WAN site has local server "A",
> >      WAN connection is down) the user can continue to send and receive
> >      mail using the local server.  Server "A" will update
> >      high_reported appropriately.  Remote server "B" may continue to
> >      receive mail for the user, but high_reported will not be updated.
> >      When connectivity is restored, so long as user continues (for the
> >      short term) to use the same server, no duplicate email should
> >      result.
> 
> But mail will be lost if mail can be received at "A" as well (think:
> local mail)

No mail lost.  The server that the user has been using will receive a
flood of messages saying "I made UID X while we couldn't talk".  The
server will say "oops, I told the user about UID Y, and X is less than
Y, and I didn't know about X", so it will update the messages with UID
Z where Z is bigger than Y.  No mail lost.

> > (E4) Multimaster, connection between servers breaks, user can reach
> >      both: until the connection breaks, communication is the same as
> >      in scenario (E2).  If the user communicates with both servers,
> >      each server will independently increase reported UID.  When
> >      connection is reestablished, one or both servers will reasign
> >      UIDs to the other's email, resulting in apparently duplicate
> >      email.  Gotta break some eggs.  Can be mitigated if user has
> >      affinity for last mail server.
> 
> What if the connection BETWEEN servers breaks, but the client can still
> access each (say they have a private dial-up connection)?

That's exactly the scenario.  The user still eventually ends up with
all email, but duplicated.  Ie.: connectivity between servers broken.
UID 1, 3, 5 arrive at A.  UIDs 2, 4, 6 arrive at B.  A sends B a
messages saying it has UIDs 1, 3, 5, and B sends A a message saying it
has UID 2, 4, 6.  But the messages are delayed because of connectivity
loss.  Meanwhile, user connects to A and reads 1 & 3, and connects to
B and reads 2, 4, & 6.  When connectivity is restored, a whole flurry
of messages go between servers.  A says that message 2 gets new UID 7
because it found out about 2 after the user read 3.  B says that 1, 3,
and 5 all need new UIDs (8, 10, 12) because the messages arrived after
the user read UID 6.  The user sees 4 new messages (7, 8, 10, 12) when
really only one is new (12, ie. the former message 5.)  Which, again,
can be mitigated by affinities.

- Morty

Reply via email to