Re: [Dovecot] Replication problem I have. And how I think I can get around problem.

2011-07-04 Thread Ed W
On 03/07/2011 07:20, Peter Dolding wrote:
 Now what I need to be able to deal with the problem.

Have you considered a new Dovecot storage backend?  I have plotted some
designs on a napkin a few times.. Consider some kind of storage server
with eventually consistent replication capabilities.  This could be
used for the metadata storage for all the emails (ie FROM, TO, DATE,
SUBJECT and all the other non body parts you might search on)

Your replication engine can now work in conjunction with Dovecot to sync
changes between servers as quickly as possible, eg if desired implement
a two phase commit when the LDA delivers new emails, so that all storage
servers confirm they have received the new email.  You could if you wish
implement quorum support (ie some server which was offline for some mail
deliveries proxies requests to another storage block until it's caught
up syncing)

You may or may not store the message bodies and attachments with the
mail metadata.  I can see performance arguments depending on what
operations you do most commonly.  A theoretically (but probably horrible
in practice) idea might be to consider DB query latency vs transfer
speed. The metadata needs to cover 99% of common searches and deliver
results quickly - as such it needs to be near Dovecot and cover the
main message headers, and perhaps also message body structure (list of
parts, etc).  Next tier down is probably satisfying full text searches
of message bodies and supplying body parts (where the most common
queries might be either just text/html parts (smart clients) or the
entire body (common clients).  Some kind of compressed blocking format
for message bodies is probably most optimal and storing larger
attachments separately would be an interesting way to increase cache hit
ratios.

I don't know whether Timo is interested in working such a project, but I
for one would be interested to sponsor some work on robust async
replication, perhaps there is some crossover with synchronous
replication that you desire?

I think this is an interesting area to develop.  Cyrus has done some
work on this stuff - I haven't followed it, but would be interesting to
see what they have done?

Good luck

Ed W


Re: [Dovecot] Replication problem I have. And how I think I can get around problem.

2011-07-04 Thread Peter Dolding
 Mon, 04 Jul 2011 10:46:31 Ed W
 On 03/07/2011 07:20, Peter Dolding wrote:
  Now what I need to be able to deal with the problem.

 Have you considered a new Dovecot storage backend?  I have plotted some
 designs on a napkin a few times.. Consider some kind of storage server
 with eventually consistent replication capabilities.  This could be
 used for the metadata storage for all the emails (ie FROM, TO, DATE,
 SUBJECT and all the other non body parts you might search on)


Remember I am new to the Dovecot source code.  How to code a Dovecot Storage
backend it where I might have to start.

Really I don't care if the servers are ever fully consistent other than the
fact they both contain the same emails.   Of course read status shared would
be nice.  If that status information is out so be it.

eventually consistent is a deadly thing to try to aim for.

Its a simple fact of what needs to be backed up.  If a user is at one
physical location all the time and connecting to the same server all the
time they will not ever see that the 2 servers are not 100 percent synced.

For my usage cases lot of times 100 percent synced will be more wasted
effort.   Roughly synced will be more than suitable.

Of course what I am talking about can be used as an foundation to get data
from point a to point b.


 Your replication engine can now work in conjunction with Dovecot to sync
 changes between servers as quickly as possible, eg if desired implement
 a two phase commit when the LDA delivers new emails, so that all storage
 servers confirm they have received the new email.  You could if you wish
 implement quorum support (ie some server which was offline for some mail
 deliveries proxies requests to another storage block until it's caught
 up syncing)


You are missing something here.  My sync's due to issues maybe chaos.  So
both servers may have been running split from each other both received
emails the other has not had users accessing them.Both with users in
using the web and local email clients.  While all that is going on the
servers have to sync.

This is very much the worst case.  I am sane.  I am prepared to give up the
users ability to change client programs between servers to make it be able
to work.  So IMAP unique identifies not be replicated ever.   Basically any
identifier that cannot be based of a server id + a unique number for server
be only unique to the server its on.

Basically I cannot ensure quickly and most of the time I don't really need
that.  As long as the email without breakage is not lost in the server for
greater than 15 min without reaching person wanting to read it that would be
fine for normal operations.  This is still faster than exchange working with
pop email accounts.

Basically Ed W.  I am saying this is the worst I can get away with in a
working office.

I have gone through the protocals I would most likely ever need to use  from
a business point of view.

pop3 I know basically does not really give a stuff if what is hiding behind
is multi master or single master.  User might receive  a few extra emails if
they change between servers.  Nothing system killing.   As long as the user
gets the emails in the end that is the important bit if they a copy for each
server bad luck at least they got the email.  Too many copies is not a issue
from business point of view not getting the copy in the first place is an
issue.

smtp messages directly from multi locations outgoing does not give a stuff.

imap4 who ever invented this protocol for what I want todo I fell like
strangling.  The id system completely sends you to hell. Business point of
view this might be a issue if uses change between servers due to having to
download everything again.

activesync/Z-push  Ok nice. ID are 64 char in size with no defined
contents.  No ascending order no trouble basically.  So nothing stopping me
doing server_id:then unique number.  So that is most mobile devices
covered.  Message might disappear when moving between not totally synced
servers.  Message will catch up as syncs do. So from business usage annoying
but issue is not long lasting.  Ie custom backend on Z-push and mobile
devices will work mostly fine with chaos between the storage servers.

MAPI is 64 chars to 512 chars.  for ID's.  Again nothing in protocol
stopping server_id:then unique number.

Web applications much of a muchness.  Since either they will connect to the
same server as the http server at each location so be protected or can be
hooked onto a new protocol to know they are connecting to a multi server
back end and detect if there are issues going on.  Mostly as bad as
activesync if you change between hosting locations email syncs might not
have caught up with you so a message might disappear temp.

So only imap4 is requiring syncing so ids are in order.  Question does
everyone need imap4 server locations to be interchangeable.  I know I
don't.  Since most people out the office either use web mail or activesync.
Only in office 

[Dovecot] Replication problem I have. And how I think I can get around problem.

2011-07-03 Thread Peter Dolding
I have two servers in two different locations.  Neither what you would call
100 percent safe from being turned off.

Most staff use web based email.  This backs onto imap server.   I do know I
will have to deal with contact lists and other items. in that client.

Worst part is the link between them may get broken so both servers may be
receiving email and back active at the same time.

I can see 1 very clear way around this problem.   If I accept that the imap
ids at each of the server will be that server only.

Since users are mostly using webbased they are not going to notice.  If they
do notice because they have connected to a different domain address stiff
bad luck.

Now what I need to be able to deal with the problem.

1 a unique server id on each message for the server the message was received
on.
1 a unique server receive id for each message as imap like id for service
recieved messages for message received directly not synced.
1 logs for messages deleted and changes that are not current server server
id.  This should be pritty simple todo.  Basically 1 log per server flushed
when synced with the server it owns to.
1 a sync function that deletes and changes messages that have been deleted
or changed at other locations and compares other servers current messages
against copies retained at the mirroring location.

Now with this.  Each new message to each store gets the next imap id along
with system wide unique combination of sever id and  service receive id.  No
modifications of already recieved messages ids.   Since I am not careing if
the imap id are matched between servers.

Fairly much able to use a custom for of imap syncing working off the server
id's and server receive ids.

Of course this solution should be fairly fault torrent.  Since each server
can directly store any message they receive.  Also it should be possible to
trace back to what server the message came from in case of spam problems or
equal.

Now my biggest problem can I attach my own custom attributes to incoming
mail to the store and access that information effectively.  Since with that
information I will be able to do a form of live synced storage.

Personally I see imap id design as a defect in protocal since it never
allowed server id along with it this is why imap 100 percent synced message
stores cannot be run on independent servers with unstable network connects
well.

Nothing comes without a price.   This solution does not require clustered
file systems or a constant active connection between servers so able to
operate in areas of disruption.  It does not block the servers from
receiving emails at any time either.

Problem is of course is if someone connects a client directly from 1 server
to another.  Mail will have to be re-downloaded.  Also I have to check of
z-push/ActiveSync  depends on the imap ids being dependable across the
network if not most hand held devices will not be a problem.  If it don't
then only imap has a problem.  I can live with that.  Ie if local email
server down use web mail until local mail server is fixed.

Old rule of networking 40 to 50 percent functional network staff can
normally still get stuff done.  0 percent functional you have downtime.

Now if I can get the email storage fault torrent and to remain operational
in case of fault I then can focus on getting the web base applications using
a equal system for contacts and other things.  So each location can remain
fairly operational no matter what.

This way if a server disappeared for good only thing that would have to be
changed is syncing.  Wise move is to allow servers to be defined more than 1
server id.  Ie a server gone for good remaining server gets told to take
over responsibility for the old servers id messages.  New replacement server
given a new id and everything keeps on going nice.

Anyone else with a better idea or advice how to add my own custom ids in a
way they cannot be distrupted and are simple to search.

If it works push for server id's in imap5?  Ie imap5 clients being able to
cope with the event that an email store is spreed between multi servers that
may not be connected all the time.

Peter Dolding