Re: Cyrus replication and failover best pracistes

2010-08-09 Thread Bron Gondwana
On Mon, Aug 09, 2010 at 08:15:36PM +0400, Dmitry Ivanov wrote:
>   Hello!
> Folks, looking through maillist history i saw that many of you are 
> running cyrus in rolling replication mode. I am interested in 
> configuring cyrus replica to use as a standby imap server, where we can 
> switch DNS in case of problems with primary backend. While testing on 
> playground I got some problems and several questions appeared, may be 
> you can help me to solve this.
> 
> 1. Is it safe to leave "sync_host:" options in imapd.conf and running 
> sync_server (due to record in cyrus.conf) on both master and replica, 
> and start only sync_client -r on master server? Or better to have 
> different config files for different roles?

Yeah, that's pretty safe.  We run sync_server on our masters as well
so that we can move users between machines.

I'm not such a fan of the sync_host config variables - I'd prefer to
pass the information on the sync_client command line.  Should go fix
that!
 
> 2. Is there any way to solve issue when master overwrites messages with 
> the same filename on replica (messages that were not synced before 
> disaster happened) during syncing back to primary host? "guid_mode: 
> sha1" set.

We have a patch at FastMail that does it.  There's one again 2.3.16,
or soon it will be the default behaviour with the new sync protocol
(I keep talking about it ...)  It's actually up and running at FastMail
now, so I'll be pushing it back to CVS soon, and we'll work on making
a release.

> May be some one can describe method of switching between replicated 
> backends in production? For now I want to switch DNS and and than 
> start/stop sync_client daemon.

We do have slightly different configurations, so we have to shut down
both ends.  In future I plan to have sync_client running at both ends,
so it's master-master, but with DNS only pointing at one end, and some
sort of "barrier" process where we kill off connections before switching.

The barrier is needed if you don't want to be in split-brain recovery
mode ALL the time, because some clients hold IMAP connections open for
days.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus replication and failover best pracistes

2010-08-09 Thread Dmitry Ivanov
Hello!
Folks, looking through maillist history i saw that many of you are 
running cyrus in rolling replication mode. I am interested in 
configuring cyrus replica to use as a standby imap server, where we can 
switch DNS in case of problems with primary backend. While testing on 
playground I got some problems and several questions appeared, may be 
you can help me to solve this.

1. Is it safe to leave "sync_host:" options in imapd.conf and running 
sync_server (due to record in cyrus.conf) on both master and replica, 
and start only sync_client -r on master server? Or better to have 
different config files for different roles?

2. Is there any way to solve issue when master overwrites messages with 
the same filename on replica (messages that were not synced before 
disaster happened) during syncing back to primary host? "guid_mode: 
sha1" set.

May be some one can describe method of switching between replicated 
backends in production? For now I want to switch DNS and and than 
start/stop sync_client daemon.

Thank you for assistance!

-- 

Dmitry S. Ivanov

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication and failover

2007-05-11 Thread Bron Gondwana
On Thu, May 10, 2007 at 12:14:44PM -0400, Nik Conwell wrote:
> Do you have separate IP addresses for each instance of cyrus on the  
> machine as well, or just the machine itself?  If just the machine,  
> what 'names' does the front-end know the back-end instances by?

Every store has an IP address for master (a.b.10.$storenumber) and
one for the replica (a.b.11.$storenumber) which maps to hosts files
entries (yay templating), so you can just refer to store6m.internal
to connect to the master IP address for store6.  Slots themselves
don't have any IP addresses.  Machines have their own base IP address,
and you can find them by, for example.

my $store = ME::ImapStore->new($storename);
# note, does DB lookup (cached for 5 seconds)
my $slot = $store->MasterSlot();
my $server = $slot->Machine();
my $ip = $server->InternalAddress();

and if you don't have perl you can always invoke it or write a small
Template::Toolkit script to spit out what you want.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication and failover

2007-05-10 Thread Nik Conwell


On Jan 18, 2007, at 5:35 PM, Rob Mueller wrote:


Attached is our operation group's notes on the subject.  It makes
reference to the tool we use to manage the OS of the machines
(radmind), but it should be pretty clear what they are talking about
without any radmind knowledge.


As an FYI, we have a similar procedure to this, the main  
differences are:


1. We don't change the DNS. Instead we give each machine a primary  
IP address, but we also create IP addresses for "cyrusXmaster" and  
"cyrusXreplica" names(where X is numbers for each machine). When we  
swap roles, we rebind the different IPs to the particular machines  
and send ARPs to clear the router table, rather than changing the  
DNS. This means you can always access the master as "cyrusXmaster"  
from every machine without having to worry about DNS getting out of  
sync.
2. Every machine has cyrus-master.conf, cyrus-replica.conf, imapd- 
master.conf and imapd-replica.conf. We just symlink cyrus.conf and  
imapd.conf to the appropriate file depending on what mode the  
machine is currently in


Do you have separate IP addresses for each instance of cyrus on the  
machine as well, or just the machine itself?  If just the machine,  
what 'names' does the front-end know the back-end instances by?


FWIW we use IP names for our 17 back-end UW mailstores...

Thanks.
-nik


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication and failover

2007-01-18 Thread Rob Mueller

Attached is our operation group's notes on the subject.  It makes
reference to the tool we use to manage the OS of the machines
(radmind), but it should be pretty clear what they are talking about
without any radmind knowledge.


As an FYI, we have a similar procedure to this, the main differences are:

1. We don't change the DNS. Instead we give each machine a primary IP 
address, but we also create IP addresses for "cyrusXmaster" and 
"cyrusXreplica" names(where X is numbers for each machine). When we swap 
roles, we rebind the different IPs to the particular machines and send ARPs 
to clear the router table, rather than changing the DNS. This means you can 
always access the master as "cyrusXmaster" from every machine without having 
to worry about DNS getting out of sync.
2. Every machine has cyrus-master.conf, cyrus-replica.conf, 
imapd-master.conf and imapd-replica.conf. We just symlink cyrus.conf and 
imapd.conf to the appropriate file depending on what mode the machine is 
currently in


Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication and failover

2007-01-18 Thread Wesley Craig

On 18 Jan 2007, at 05:41, Janne Peltonen wrote:

Is there documentation abt replication failover scenarios anywhere? I
can, of course, conjure up a thing or two, but I'd like to see how  
other
people have resolved 'corrupted mailspool -> services to the  
replica ->
maintenance -> resync master -> services back to the master'  
situations.

I did a short Google, but didn't find much of notice.


Attached is our operation group's notes on the subject.  It makes  
reference to the tool we use to manage the OS of the machines  
(radmind), but it should be pretty clear what they are talking about  
without any radmind knowledge.


:wes

1. Establish primary failure
we believe that the failover procedure should take approximately 30 
minutes, so the failover procedure should be invoked whenever the estimated 
downtime on the primary would exceed this amount of time
an exception may be made if there is reason to believe that a substantial 
amount of data on the failed primary was not synched to the replica; we will 
discuss the feasibility of sanity checks which can be run prior to failover

2. Stop cyrus/sync_client on primary if necessary / remove primary from network 
if necessary
/etc/init.d/cyrus stop
/etc/init.d/sync_client stop
/etc/init.d/network stop (or unplug network cable)

3. stop cyrus on the replica
/etc/init.d/cyrus stop

4. Change dns so that the name of -repl becomes 
-> ensure you change forward and reverse
-> leave original entries commented out

5. Verify dns changes are working by checking on truelies
dig .mail
dnsrev the ip

6. Put special files of -repl in place for  to reflect ip 
information of replica
cd to special dir (generally /var/radmind/special/imap)
cp -R  .save
cp -repl/etc/sysconfig/network /etc/sysconfig/network
cp -repl/etc/sysconfig/network-devices/ifconfig.eth0 
/etc/sysconfig/network-devices
edit network to fix hostname
vi /etc/sysconfig/network
  
7. radmind the replica
ra.sh update

Update command file and/or transcripts? [Yn] y
/var/radmind/client/command.K: updated
/var/radmind/client/special.T: updated
c ./dev/ttyS0   0600 0 0 464
special.T:
+ f ./etc/adsm/TSM.PWD  0444 0 0 1093046900 
164 TIgISWWzEESwLKsM5TQx4CRH1hc=
imap/imap-23backend.T:
+ f ./etc/cyrus.conf0644 0 0 1156541554
1380 HqMdPv649xvUptagZY1X489CCpo=
imap/imap.T:
+ f ./etc/imapd.common.conf 0644 0 0 1119845235 
871 kTjkwR4x0SwRuK3qvpKi2ZGwANU=
imap/imap-23backend.T:
+ f ./etc/imapd.conf0644 0 0 1155789187 
343 RIr24APHrHa8fp6YTCezsGUCK4U=
special.T:
+ f ./etc/imapd.host.conf   0444 0 0 1156186085 
104 RIgobQuTFI/HRQNmF4H4WEEoU1I=
+ f ./etc/krb5.keytab   0640 025 1093051728 
952 hk7wwXNZgVqyiPgB8BQ55fGtULg=
+ f ./etc/sysconfig/network 0644 0 0 1166473054 
 81 pfuFsI4FuD763RKzCIXMHojQadc=
+ f ./etc/sysconfig/network-devices/ifconfig.eth0   0644 0 0 
1166473074  78 yXkW7BokmxryTqqJKLmFl9zc3Qs=
+ f ./etc/sysconfig/network-devices/ifconfig.eth1   0444 0 0 
1166473075  71 yvCcuy3ATic/4AXPPVa1zeoPnbo=
- f ./opt/tivoli/tsm/client/ba/bin/dsm.sys  0644 0 0 1164130511 
418 -
+ f ./var/imap/hostname.pem 0444 0 0 1155787168
2920 Hyfrb/Sg4WkWHp/dUYHe8q9/cv4=


8.  /etc/init.d/network restart
hostname  (remember to use fqdn)
pkill syslogd ksyslogd

or reboot (your choice)

9. start cyrus
su cyrus
(get tickets)
/usr/local/heimdal-k5/bin/kinit -k -l 25h imap/[EMAIL PROTECTED]
ctl_mboxlist -m -w
(no output is good!!!)

(exit so you are root)
init 3

10. comment out replnag  until new replica is brought up

11. restart nefu to catch ip change

*** bringing up a new replica, hopefully on same hardware **

1. update DNS for new replica

2. set up special files of -repl
cd to special dir (generally /var/radmind/special/imap)
cp -R -repl -repl.save
cp .save/etc/sysconfig/network -repl/etc/sysconfig/network
cp .save/etc/sysconfig/network-devices/ifconfig.eth0 
-repl/etc/sysconfig/network-devices
edit network to fix hostname
vi -repl/etc/sysconfig/network

3. reload new replica with existing command file

4. boot new replica & start cyrus

5. generate list of mailboxes & sync
to get mailboxes
ctl_mboxlist -d > /tmp/users
awk '{ print $1 }' /tmp/users | xargs sync_client -v -l -m 

6. start sync client


*** switch back during next maintenance window ***

1. stop cyrus on primary
init 2

2. verify that /var/imap/sync is empty (no pending syncs), if not run 
sync_client -v -l -r -f 
on any remaining log files, delete each file after syncing

3. swap DNS

4. move specials back into pl

Replication and failover

2007-01-18 Thread Janne Peltonen
Hi!

Is there documentation abt replication failover scenarios anywhere? I
can, of course, conjure up a thing or two, but I'd like to see how other
people have resolved 'corrupted mailspool -> services to the replica ->
maintenance -> resync master -> services back to the master' situations.
I did a short Google, but didn't find much of notice.


--Janne Peltonen
Email admin
Univ. of Helsinki

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html