Re: 2.3.1 replication and deliver problem
- Original Message - From: "Patrick H Radtke" <[EMAIL PROTECTED]> To: "Dmitry Melekhov" <[EMAIL PROTECTED]> Cc: Sent: Friday, February 03, 2006 6:40 PM Subject: Re: 2.3.1 replication and deliver problem On Fri, 3 Feb 2006, Dmitry Melekhov wrote: Patrick Radtke wrote: Maybe check the log on your replica. Possibly something is going wrong with sync_server (though it seems unlikely since sync_client -u works) Yes. Something is wrong with sync_server. Feb 3 09:10:28 backup syncserver[1899]: Fatal error: Virtual memory exhausted I saw this problem quite some time ago. What Operating system are you running? How much memory do you have? This is Novell SLES9 (someting like Suse 9.1), machine is single p4 with 2Gb RAM. Anyhow, as a work around we did two different things. Thank you, I'll try this workarounds. More interesting is ,certanly, to find a bug... ;-) Unfortunately, I have almost no time, but I'll read sources... Thank you! Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Vincent Deffontaines wrote: Actually, my permission problem must be different from Dmitry's problem. I just tried running sync_client by hand and it quits : $/usr/cyrus/bin/sync_client -v -u username Error from send_lock(): bailing out! Syslog says : sync_client[21692]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication sync_client[21692]: LOCK received NO response: Permission denied Is the "no authentication" a problem? I would think not, as the replica's syslog says : syncserver[13455]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in But then, I don't know what the permission problem is... Thanks, Vincent It works now. For the record, I had to set : "admins: cyrus-admin" in imapd.conf not "admin: cyrus-admin" No comment ;) Vincent Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
On Fri, 3 Feb 2006, Dmitry Melekhov wrote: Patrick Radtke wrote: Maybe check the log on your replica. Possibly something is going wrong with sync_server (though it seems unlikely since sync_client -u works) Yes. Something is wrong with sync_server. Feb 3 09:10:28 backup syncserver[1899]: Fatal error: Virtual memory exhausted It hangs during replication of my mailbox which is about 600 Mb... Is it possible to get more verbose debug from sync_server? I saw this problem quite some time ago. What Operating system are you running? How much memory do you have? You may want to file a bug report (if there isn't one already). When we saw the problem during the summer, Ken was unable to duplicate the situtation which made it hard to test possible solutions. We weren't sure if it was something wrong with our system or with sync_server. Anyhow, as a work around we did two different things. 1. Replicate the user a bunch of times. It will fail each time, but should get further along until it succeeds. 2. If sync_server fails, I think it leaves some files around that it may or may not clean up. On the replica, shut down cyrus and in the directory that has your mail (e.g. /var/spool/imap) there should be a folder called .sync. I found removing this sometimes helped. 3. If replication is stuck on a specific mailbox, then use cyradm to create the mailbox on the replica, then scp the users files over, run reconstruct. Then run sync_client and it should clean up any discrepancies. The good news is that the problem only seems to occur for the inital copying of data. Staying in sync afterwards never seeems to trigger the bug. -Patrick Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Patrick Radtke wrote: Maybe check the log on your replica. Possibly something is going wrong with sync_server (though it seems unlikely since sync_client -u works) Yes. Something is wrong with sync_server. Feb 3 09:10:28 backup syncserver[1899]: Fatal error: Virtual memory exhausted It hangs during replication of my mailbox which is about 600 Mb... Is it possible to get more verbose debug from sync_server? Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Vincent Deffontaines wrote: Greetings, I am experimenting problems quite similar to those of Dmitry. I had the same TLS setup problem, now fixed. If interest is raised I can patch the replication documentation so that other users will get it straightforward. Any additions/clarifications to the documentation are always welcome. -- Kenneth Murchison Project Cyrus Developer/Maintainer Carnegie Mellon University Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Actually, my permission problem must be different from Dmitry's problem. I just tried running sync_client by hand and it quits : $/usr/cyrus/bin/sync_client -v -u username Error from send_lock(): bailing out! Syslog says : sync_client[21692]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication sync_client[21692]: LOCK received NO response: Permission denied Is the "no authentication" a problem? I would think not, as the replica's syslog says : syncserver[13455]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in But then, I don't know what the permission problem is... Thanks, Vincent Vincent Deffontaines wrote: Greetings, I am experimenting problems quite similar to those of Dmitry. I had the same TLS setup problem, now fixed. If interest is raised I can patch the replication documentation so that other users will get it straightforward. Now, here is what I get in my syslogs : o Master server : Feb 2 09:25:25 master sync_client[25787]: Doing a peer verify Feb 2 09:25:25 master sync_client[25787]: Doing a peer verify Feb 2 09:25:25 master sync_client[25787]: received server certificate Feb 2 09:25:25 master sync_client[25787]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication [snip] Feb 2 09:35:26 master sync_client[25432]: RESTART received NO response: Permission denied Feb 2 09:35:26 master sync_client[25432]: sync_client RESTART failed That seems to be the last message I get from sync_client. It no longer runs now. o Replica : Feb 2 09:19:49 replica syncserver[26003]: executed Feb 2 09:19:49 replica syncserver[28997]: mystore: starting txn 2147483653 Feb 2 09:19:49 replica syncserver[28997]: mystore: committing txn 2147483653 Feb 2 09:19:49 replica syncserver[28997]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication Feb 2 09:19:49 replica syncserver[28997]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in Feb 2 09:24:46 replica syncserver[26003]: accepted connection Feb 2 09:24:46 replica syncserver[26003]: cmdloop(): startup Feb 2 09:24:47 replica syncserver[26003]: mystore: starting txn 2147483656 Feb 2 09:24:47 replica syncserver[26003]: mystore: committing txn 2147483656 Feb 2 09:24:47 replica syncserver[26003]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication Feb 2 09:24:47 replica syncserver[26003]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in So, I suppose the problem is about the "permission denied" raised on the master log? I have "admin: cyrus-admin" on both imapd.confs. I have attached a strace on the sync_client process. It raises many repetitive : ## BEGIN WHILE(1) time(NULL) = 1138871181 stat64("/var/lib/cyrus/sync/log", 0xbf983fac) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({1, 0}, {1, 0}) = 0 ##END WHILE(1) then after a dozens of minutes : time(NULL) = 1138871669 write(5, "\27\3\1\0 S\16\316\344\226%\225B\332cA\t\367HG\317\222"..., 37) = 37 time(NULL) = 1138871669 read(5, "\27\3\1\", 5) = 5 read(5, "\340\204\255b0t*\350\203\21\2X\223Y\342\342\242\375\331"..., 48) = 48 time([1138871669]) = 1138871669 getpid()= 21510 rt_sigaction(SIGPIPE, {0xb7d04a70, [], SA_RESTORER, 0xb7c58a18}, {SIG_DFL}, 8) = 0 send(6, "<179>Feb 2 10:14:29 sync_client"..., 88, 0) = 88 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 time([1138871669]) = 1138871669 getpid()= 21510 rt_sigaction(SIGPIPE, {0xb7d04a70, [], SA_RESTORER, 0xb7c58a18}, {SIG_DFL}, 8) = 0 send(6, "<179>Feb 2 10:14:29 sync_client"..., 67, 0) = 67 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 exit_group(1) Any help will be appreciated. Vincent Deffontaines PS : sorry if this is starting a new thread, I am willing to reply to the thread opened by Dmitry, but I just subscribed to the ML. PS2 : in case this matters, I have applied the Athens university autocreate patch on both master and replica. Bash me if this is stupid :) Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Greetings, I am experimenting problems quite similar to those of Dmitry. I had the same TLS setup problem, now fixed. If interest is raised I can patch the replication documentation so that other users will get it straightforward. Now, here is what I get in my syslogs : o Master server : Feb 2 09:25:25 master sync_client[25787]: Doing a peer verify Feb 2 09:25:25 master sync_client[25787]: Doing a peer verify Feb 2 09:25:25 master sync_client[25787]: received server certificate Feb 2 09:25:25 master sync_client[25787]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication [snip] Feb 2 09:35:26 master sync_client[25432]: RESTART received NO response: Permission denied Feb 2 09:35:26 master sync_client[25432]: sync_client RESTART failed That seems to be the last message I get from sync_client. It no longer runs now. o Replica : Feb 2 09:19:49 replica syncserver[26003]: executed Feb 2 09:19:49 replica syncserver[28997]: mystore: starting txn 2147483653 Feb 2 09:19:49 replica syncserver[28997]: mystore: committing txn 2147483653 Feb 2 09:19:49 replica syncserver[28997]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication Feb 2 09:19:49 replica syncserver[28997]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in Feb 2 09:24:46 replica syncserver[26003]: accepted connection Feb 2 09:24:46 replica syncserver[26003]: cmdloop(): startup Feb 2 09:24:47 replica syncserver[26003]: mystore: starting txn 2147483656 Feb 2 09:24:47 replica syncserver[26003]: mystore: committing txn 2147483656 Feb 2 09:24:47 replica syncserver[26003]: starttls: TLSv1 with cipher AES256-SHA (256/256 bits new) no authentication Feb 2 09:24:47 replica syncserver[26003]: login: master [10.1.32.141] cyrus-admin PLAIN+TLS User logged in So, I suppose the problem is about the "permission denied" raised on the master log? I have "admin: cyrus-admin" on both imapd.confs. I have attached a strace on the sync_client process. It raises many repetitive : ## BEGIN WHILE(1) time(NULL) = 1138871181 stat64("/var/lib/cyrus/sync/log", 0xbf983fac) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({1, 0}, {1, 0}) = 0 ##END WHILE(1) then after a dozens of minutes : time(NULL) = 1138871669 write(5, "\27\3\1\0 S\16\316\344\226%\225B\332cA\t\367HG\317\222"..., 37) = 37 time(NULL) = 1138871669 read(5, "\27\3\1\", 5) = 5 read(5, "\340\204\255b0t*\350\203\21\2X\223Y\342\342\242\375\331"..., 48) = 48 time([1138871669]) = 1138871669 getpid()= 21510 rt_sigaction(SIGPIPE, {0xb7d04a70, [], SA_RESTORER, 0xb7c58a18}, {SIG_DFL}, 8) = 0 send(6, "<179>Feb 2 10:14:29 sync_client"..., 88, 0) = 88 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 time([1138871669]) = 1138871669 getpid()= 21510 rt_sigaction(SIGPIPE, {0xb7d04a70, [], SA_RESTORER, 0xb7c58a18}, {SIG_DFL}, 8) = 0 send(6, "<179>Feb 2 10:14:29 sync_client"..., 67, 0) = 67 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 exit_group(1) Any help will be appreciated. Vincent Deffontaines PS : sorry if this is starting a new thread, I am willing to reply to the thread opened by Dmitry, but I just subscribed to the ML. PS2 : in case this matters, I have applied the Athens university autocreate patch on both master and replica. Bash me if this is stupid :) Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
On Jan 31, 2006, at 4:06 AM, Dmitry Melekhov wrote: David Carter wrote: On Tue, 31 Jan 2006, Dmitry Melekhov wrote: This is what I see. Promoting: MAILBOX user.dm -> USER dm Error in do_sync(): bailing out! Not too informational message... syslog should tell you why it decided to bail out. Unfortunately I see in log (i.e. -l ) only what I see on console with -v. Maybe check the log on your replica. Possibly something is going wrong with sync_server (though it seems unlikely since sync_client -u works) For debugging, you could try setting '-w 60' and then attaching gdb to the running process. -w 60 make sync_client wait 60 seconds before processing the log file. -Patrick Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
David Carter wrote: On Tue, 31 Jan 2006, Dmitry Melekhov wrote: This is what I see. Promoting: MAILBOX user.dm -> USER dm Error in do_sync(): bailing out! Not too informational message... syslog should tell you why it decided to bail out. Unfortunately I see in log (i.e. -l ) only what I see on console with -v. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
On Tue, 31 Jan 2006, Dmitry Melekhov wrote: This is what I see. Promoting: MAILBOX user.dm -> USER dm Error in do_sync(): bailing out! Not too informational message... syslog should tell you why it decided to bail out. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Patrick H Radtke wrote: But it is not completely clear how sync_client -r replication works. I see that this process juts disappears after some time and I see no replication at all... but sync_client -u replication works OK.. sync_client -r should stay running. Try running it with the -l -r and it will log what its doing to your log file. You have 'sync_log: 1' in your imapd.conf file? When sync_client stops running then it should log the reason and 'bailing out' to the log file. Hello! This is what I see. Promoting: MAILBOX user.dm -> USER dm Error in do_sync(): bailing out! Not too informational message... Is there any way to debug sync_client? Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
But it is not completely clear how sync_client -r replication works. I see that this process juts disappears after some time and I see no replication at all... but sync_client -u replication works OK.. sync_client -r should stay running. Try running it with the -l -r and it will log what its doing to your log file. You have 'sync_log: 1' in your imapd.conf file? When sync_client stops running then it should log the reason and 'bailing out' to the log file. -Patrick Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Patrick H Radtke wrote: I don't think your replica machine ('backup') is configured correctly Thank you! You were right. Now sync_client connects to server correctly. But it is not completely clear how sync_client -r replication works. I see that this process juts disappears after some time and I see no replication at all... but sync_client -u replication works OK.. On your replica do you have lines like: tls_cert_file: /var/cyrus/mail.pem tls_key_file: /var/cyrus/mail.pem tls_ca_file: /var/cyrus/GeoTrustCA.pem sasl_mech_list: PLAIN Can you connect to the replica on the imap port with STARTTLS and PLAIN? Try using imtest to find out. imtest -u cyrus -a cyrus -t "" -m PLAIN liverwurst2 Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
On Thu, 19 Jan 2006, Dmitry Melekhov wrote: Patrick H Radtke wrote: So the connection works, but there is probably a problem with the authentication. What authentication mechanism are you useing for sync_client? sync_server should advertise the available SASL mechanisms. If you are using PLAIN then you need to have a certificate, so STARTTLS will work prior to sending a password. here's what one of our replica's advertises telnet liverwurst2 2005 Trying 128.59.33.151... Connected to liverwurst2.cc.columbia.edu (128.59.33.151). Escape character is '^]'. * SASL GSSAPI * STARTTLS * OK liverwurst2.cc.columbia.edu Cyrus sync server v2.3-alpha Sorry, I don't understand completely. I have certificate and use it for imaps. And, yes, I use plain. What I need to write in imap.conf? I don't think your replica machine ('backup') is configured correctly On your replica do you have lines like: tls_cert_file: /var/cyrus/mail.pem tls_key_file: /var/cyrus/mail.pem tls_ca_file: /var/cyrus/GeoTrustCA.pem sasl_mech_list: PLAIN Can you connect to the replica on the imap port with STARTTLS and PLAIN? Try using imtest to find out. imtest -u cyrus -a cyrus -t "" -m PLAIN liverwurst2 -Patrick Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
Patrick H Radtke wrote: So the connection works, but there is probably a problem with the authentication. What authentication mechanism are you useing for sync_client? sync_server should advertise the available SASL mechanisms. If you are using PLAIN then you need to have a certificate, so STARTTLS will work prior to sending a password. here's what one of our replica's advertises telnet liverwurst2 2005 Trying 128.59.33.151... Connected to liverwurst2.cc.columbia.edu (128.59.33.151). Escape character is '^]'. * SASL GSSAPI * STARTTLS * OK liverwurst2.cc.columbia.edu Cyrus sync server v2.3-alpha Sorry, I don't understand completely. I have certificate and use it for imaps. And, yes, I use plain. What I need to write in imap.conf? Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: 2.3.1 replication and deliver problem
On Thu, 19 Jan 2006, Dmitry Melekhov wrote: Patrick Radtke wrote: What happens when you try telnet backup 2005 [EMAIL PROTECTED] dm]$ telnet backup 2005 Trying 192.168.22.211... Connected to backup.p98.belkam.com (192.168.22.211). Escape character is '^]'. * OK backup Cyrus sync server v2.3.1 So it works. So the connection works, but there is probably a problem with the authentication. What authentication mechanism are you useing for sync_client? sync_server should advertise the available SASL mechanisms. If you are using PLAIN then you need to have a certificate, so STARTTLS will work prior to sending a password. here's what one of our replica's advertises telnet liverwurst2 2005 Trying 128.59.33.151... Connected to liverwurst2.cc.columbia.edu (128.59.33.151). Escape character is '^]'. * SASL GSSAPI * STARTTLS * OK liverwurst2.cc.columbia.edu Cyrus sync server v2.3-alpha -Patrick Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html