Re: Cyrus Deadblocking
У пт, 2008-12-26 у 10:52 -0800, Scott Likens пише: I've been running Cyrus 2.3.13 successfully on Gentoo (amd64/x86_64) for quite some time without any issues. Like i did... I am on gentoo x86_64 ~amd64 keyword. Never had any problems with that. Mailserver isnt that big. I have ca. 60 mailboxes and trafic is near 450 incomming and 350 outgoing messages per hour. It's currently linked against bdb 4.6, however I use skiplist for all my databases as I found overall that is much cleaner in the long run. Yes, thats what worked for me since quite long time. I had sendmail, cyrus-sasl and spamassassin(with perl libs) compiled against this version. However, I can honestly say I have never run into your issue with cyrus starting to hang like that. However, you want to ensure that both cyrus-sasl and imapd are linked to the same version of bdb, otherwise there's issues. Try to switch deliver db from skiplist to berkeley format and wait some time until it starts hanging... ... So far the point of this email is pretty pointless, but I wanted to say that switching distributions is not ever an acceptable question/answer. Totaly agree. Having more detail from /var/log/messages would be very helpful as cyrus does tend to send debug information to syslog when it's crashing, so we can get more detail of why. Thats the problem, it just hang. You can see that pretty easy just trying sendmail -bv s...@adresss never return to promt, because sendmail wait for smmapd to return from checking mailbox. Or just start imap client, it will connect, but never get mails and etc. Identifieng problem is not that easy, because syslog doesn't show any DB cuptions, or problems. Dmesg isn't reporting anything wrong and strace on cyrus processes most time just do no output, or write a lot of select(0...) timeout. - What is not bad, but normal as i heared. Even if saying nothing in strace isnt good, it still doesnt help to identifiy the Problem. Throw try and catch i found that removing deliver.db and restart cyrus leads to longer life until one of cyrus processes hangs again. So what i did, i completely moved cyrus mail to another server. But after few mins it did same. I reinstalled new gentoo system with older glibc-2.8 but problem was same. Only thing what helps is to add duplicate_db: skiplist to the imapd.conf It was running stable on this new machine with this settings and compiled against sys-libs/db-4.6.21_p3-r1 sys-libs/glibc-2.9_p20081201 and sys-devel/gcc-4.3.2-r1 Now i moved back to the old machine with reinstaled system: sys-devel/gcc-4.3.2-r1 sys-libs/glibc-2.8_p20080602-r1 sys-libs/db-4.7.25_p1-r1 and runs stable too with skiplist as the deliver.db As soon i switch back from skiplist i can reproduce the problem. So, i found solution, but i realy can't say whats wrong. I mean i had this configuration runned since few years allready. Realy didn't changed anything radicaly in cyrus. I am happy now with running stable again, but if i can provide some more info to identify what was wrong, i would like to help. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Adam Tauno Williams wrote: Why? If so it makes more sense to convert your databases to skiplist and see if that helps than to flop library versions. Problem looks to be localized. After switching deliver.db to skiplist format it looks to run more stable (not sure yet, have to wait some time). More about what did found i will write later after i am 100% sure problem is identified. One thing i noticed over all this days, if i completely wipe deliver.db it takes longer to make cyrus processes hang again than just only restart cyrus. Maddly flipping versions seems a poor diagnostic method (if it even qualifies as a diagnostic method). In some special way, you have right. But as example cyrus-sasl crash if it is compiled against 4.3.x versions of Berkeley DB. And works great with 4.5, 4.6 and 4.7 So sometimes trying deifferent version gives some result too. The best approach is to switch to a distribution 1) not acceptable 2) you dont believe realy self what you wrote here, didnt you ? -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Hi Teresa, I've been running Cyrus 2.3.13 successfully on Gentoo (amd64/x86_64) for quite some time without any issues. It's currently linked against bdb 4.6, however I use skiplist for all my databases as I found overall that is much cleaner in the long run. However, I can honestly say I have never run into your issue with cyrus starting to hang like that. However, you want to ensure that both cyrus-sasl and imapd are linked to the same version of bdb, otherwise there's issues. ... So far the point of this email is pretty pointless, but I wanted to say that switching distributions is not ever an acceptable question/answer. Having more detail from /var/log/messages would be very helpful as cyrus does tend to send debug information to syslog when it's crashing, so we can get more detail of why. Scott On Dec 26, 2008, at 4:06 AM, Teresa wrote: Adam Tauno Williams wrote: Why? If so it makes more sense to convert your databases to skiplist and see if that helps than to flop library versions. Problem looks to be localized. After switching deliver.db to skiplist format it looks to run more stable (not sure yet, have to wait some time). More about what did found i will write later after i am 100% sure problem is identified. One thing i noticed over all this days, if i completely wipe deliver.db it takes longer to make cyrus processes hang again than just only restart cyrus. Maddly flipping versions seems a poor diagnostic method (if it even qualifies as a diagnostic method). In some special way, you have right. But as example cyrus-sasl crash if it is compiled against 4.3.x versions of Berkeley DB. And works great with 4.5, 4.6 and 4.7 So sometimes trying deifferent version gives some result too. The best approach is to switch to a distribution 1) not acceptable 2) you dont believe realy self what you wrote here, didnt you ? -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html !DSPAM:4954caa2131671804284693! Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Tue, 23 Dec 2008 11:30:31 +0100, Teresa teresa...@myeburg.net wrote: Teresa wrote: reconstruct -r -f user and now run cyrus without squatted. And it seems to work. I have no idea if its on squatter, or on few broken folders. Running stable for about 6 hours now. Ok, latest state. After 13 hours happy running it did hanged again. I did downgraded kernel to 2.6.26.8 and it doesn't changed anything. Behavier the same as with 2.6.27.10. So i think thats because of Berkley DB and glibc-2.9 It still randomly hangs. One of cyrus processes (ipurge, smmapd, imapd or pop3) just hangs, sometimes it take few mins to happend, sometimes few hours, or it can run even whole week. What i did now is update to db-4.7.25 maybe it works more stable with glibc-2.9 i dont know. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Wed, 2008-12-24 at 14:27 +0100, Teresa wrote: On Tue, 23 Dec 2008 11:30:31 +0100, Teresa teresa...@myeburg.net wrote: Teresa wrote: reconstruct -r -f user and now run cyrus without squatted. And it seems to work. I have no idea if its on squatter, or on few broken folders. Running stable for about 6 hours now. Ok, latest state. After 13 hours happy running it did hanged again. I did downgraded kernel to 2.6.26.8 and it doesn't changed anything. Behavier the same as with 2.6.27.10. So i think thats because of Berkley DB and glibc-2.9 Why? If so it makes more sense to convert your databases to skiplist and see if that helps than to flop library versions. It still randomly hangs. One of cyrus processes (ipurge, smmapd, imapd or pop3) just hangs, sometimes it take few mins to happend, sometimes few hours, or it can run even whole week. What i did now is update to db-4.7.25 maybe it works more stable with glibc-2.9 i dont know. Maddly flipping versions seems a poor diagnostic method (if it even qualifies as a diagnostic method). The best approach is to switch to a distribution where things are tested and shipped in a known-working binary (w/dependencies) built by people who actually understand what the various compiler options mean, etc... Your method of shut-gunning various library versions isn't very likely to lead you to a solution. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Tue, 2008-12-23 at 02:44 +0100, Teresa wrote: Adam Tauno Williams wrote: Does dmesg show anything odd? Another thing i get sometimes connecting hanging cyrus process with strace is a lot of : select(0, NULL, NULL, NULL, {0, 25000}) = 0 (Timeout) few per second, and it never ends... This above should be pretty normal. Select polls for any I/O, times out (because there is nothing to do), and then the process re-issues the select. Many services/servers use such a method to handle async I/O. I'd guess the above is a call to: int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); - so the last value, the {0, 25000} is the timeout timeval struct - struct timeval { __time_t tv_sec;/* Seconds. */ __suseconds_t tv_usec; /* Microseconds. */ }; - so you get one of the select(...) calls roughly every 25,000 microseconds since. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Mon, 2008-12-22 at 23:11 +0100, Teresa wrote: Adam Tauno Williams wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Does dmesg show anything odd? Ok guys, it happend again just right now. Exactly same behavier as described befor, but after few days successfully running. I restarted a cyrus and sendmail and attached strace to the lmpd this time. After some time (squatter was still working) it goes to take 100% cpu and doenst answer. Hangs. If you attach to a hung process with strace -p {pid} what does it look like? No idea what it says, but here is my strace output from lmtpd that hangs at the end. Then i sended kill command, what you can see at the last line: http://kvitka.net/log2.strace.txt Looks like one of the last things it did was put a message into user.teresa.Junk and then notify idled that the contents of user.teresa.Junk had changed. Nothing very suspicious. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Teresa wrote: Hi all, since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Ok, new report. I did: reconstruct -r -f user and now run cyrus without squatted. And it seems to work. I have no idea if its on squatter, or on few broken folders. Running stable for about 6 hours now. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Adam Tauno Williams wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Does dmesg show anything odd? Ok guys, it happend again just right now. Exactly same behavier as described befor, but after few days successfully running. I restarted a cyrus and sendmail and attached strace to the lmpd this time. After some time (squatter was still working) it goes to take 100% cpu and doenst answer. Hangs. If you attach to a hung process with strace -p {pid} what does it look like? No idea what it says, but here is my strace output from lmtpd that hangs at the end. Then i sended kill command, what you can see at the last line: http://kvitka.net/log2.strace.txt I did Updated to latest kernel 2.6.27.10 cyrus 1509 0.0 0.0 36508 1940 ?Ss 22:51 0:00 /usr/lib/cyrus/master cyrus 1557 0.0 0.0 70600 652 ?S22:51 0:00 idled cyrus 1578 23.1 10.0 284276 208016 ? R22:51 3:57 squatter -r user cyrus 1583 0.0 0.1 98224 4012 ?S22:51 0:00 imapd -s cyrus 1584 0.0 0.1 98224 3996 ?S22:51 0:00 imapd -s cyrus 1585 0.0 0.1 98224 3912 ?S22:51 0:00 imapd -s cyrus 1586 0.0 0.1 98224 3932 ?S22:51 0:00 imapd -s cyrus 1587 0.0 0.1 98224 4012 ?S22:51 0:00 imapd -s cyrus 1737 0.0 0.0 74956 2048 ?S22:51 0:00 smmapd cyrus 2061 0.3 0.1 98012 3888 ?S23:07 0:00 pop3d -s Now i get a lot of this processes, and everything seems to work again. Let see what happends if i start sendmail... Any idea whats wrong ? -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Teresa wrote: Let see what happends if i start sendmail... Steel hangs after some time running... Now sometimes it even damage the DB. I delete all except mailbox.db and it starts again... but not for long... -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Adam Tauno Williams wrote: Does dmesg show anything odd? Another thing i get sometimes connecting hanging cyrus process with strace is a lot of : select(0, NULL, NULL, NULL, {0, 25000}) = 0 (Timeout) few per second, and it never ends... -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Mon, 15 Dec 2008 10:42:08 -0200, Henrique de Moraes Holschuh h...@debian.org wrote: On Mon, 15 Dec 2008, Teresa wrote: Which kernel? If it is Linux 2.6.27.8 or 2.6.27.9, try downgrading... Thanks for response. I use 2.6.27.7 vanila kernel (not gentoo-source). Didnt rebooted for about near a mounth. Yesterday i rebooted also in one of last hope that would fix something (i know that doesnt work, and it didnt, it never does :) if something isnt working). Actualy system is raning stable now again. I didnt changed anything, didnt compiled or rebooted. I just restarted cyrus and sendmail few times. And after one of this restarts it run stable. I have no idea why. There is nothing different. No system log messages about broken DB or something else. Once thing i saw strange in this 2 days was : lmtpd[3467] general protection ip:7f2e45ffdb2e sp:7fff4ee81968 error:0 in libdb-4.6.so[7f2e45f2d000+136000] in dmesg. I think this comes from new glibc. But it doesnt breake functionality by now. I have stable working for 4 hours already, system load goes down to 0.0 again. No deadlocking... I saw there is new ebuilds for berkley db 4.7.25 are in portage. Is anybody used this version already ? Maybe compiling cyrus agains this lib will perform better ? Or that looks more like kernel problem ? How i check that ? In htop the process that get 100%cpu load isnt in D state, so its not real deadlock, it just goest in some loop somewhere i suppouse under some condition that doesnt happend allways. Yesterday i also tried to downgrade to 2.3.12_p2 cyrus-imapd. But got same behavier, so i updated to 2.3.13 back again. I will report if i find something more. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Mon, 15 Dec 2008 08:40:31 -0500, Adam Tauno Williams awill...@whitemice.org wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Does dmesg show anything odd? Not realy. Its quiet, only this strange messages comes in this 2 days also in, thay allways look like that: lmtpd[3467] general protection ip:7f2e45ffdb2e sp:7fff4ee81968 error:0 in libdb-4.6.so[7f2e45f2d000+136000] I didnt changed anything lately, but yesterday my cyrus starts rise cpu But it still work already for 4 hours here, even if this message is once in my dmesg now. If you attach to a hung process with strace -p {pid} what does it look like? Now its run, and as its produktion server, i will leave it running as long it will self :) But next time and i am mostly sure it will come again, i will do that strace. I am running this mailbox already since 2003. Cyrus had some nasty problems with berkleydb few times in the past (2.2.x versions). But for last 2 years i never had realy a problem with it. Did you restart the services after the update? I am on gentoo box. Gentoo is ok, i am self in trouble because i run unstable ~x86_64 keyword. I know that, so i have to manage my problems self. Gentoo has nothing to do with that. But you've right, something with system is not right at the moment. If cyrus goes in to the blocking state, Sounds to be like Cyrus is not the only thing getting hung, which indicates the problem probably lies elsewhere. Actualy only cyrus processes are in trouble. iprune do it job, as example, but never get out to promt. I got my kernel now updated to 2.6.27.9. It runs now 2.6.29.7. If it crash again, i reboot to new kernel and will see if something is changed. -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Mon, 15 Dec 2008, Patrick Boutilier wrote: Henrique de Moraes Holschuh wrote: On Mon, 15 Dec 2008, Teresa wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Which kernel? If it is Linux 2.6.27.8 or 2.6.27.9, try downgrading... What is wrong with those kernels? The lack of this: http://lkml.indiana.edu/hypermail/linux/kernel/0812.1/00998.html Thread here: http://lkml.indiana.edu/hypermail/linux/kernel/0812.1/index.html#6 2.6.27.10 will be much better. I am not touching 2.6.27 at all until it is out (still running 2.6.26.y here), but probably I won't consider it until it reaches 2.6.27.12 or thereabouts. No, I am not sure it would break Cyrus IMAP. But one doesn't let Cyrus IMAP anywhere near a kernel that is suspect of less than pristine shared memory or mmap behaviour, it would be the same as walking around with dead fish in a basket, near a bunch of starved cats. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Cyrus Deadblocking
Hi all, since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. I didnt changed anything lately, but yesterday my cyrus starts rise cpu load up to 100% and after some time it stop responding. Mostly its a lmtp process, but it happends to pop3 also, or to imapd process self. What helps - restart. There is nothing in the log what would show the problem. All sendmail processes, as they use smmapd for local delivery are blocked also. Ca. 2 weeks ago i updated glibc to 2.9 version. But it worked this two weeks fine. I am on gentoo box. [ebuild R ] sys-libs/db-4.6.21_p3-r1 USE=-bootstrap -doc -java -nocxx -tcl -test 0 kB [ebuild R ] sys-libs/glibc-2.9_p20081201 USE=gd (multilib) nls -debug -glibc-compat20 -glibc-omitfp (-hardened) -profile (-selinux) -vanilla 0 kB [ebuild R ] net-mail/cyrus-imapd-2.3.13 USE=idled pam ssl tcpd (-drac) -kerberos -kolab -nntp -replication -snmp 0 kB I use squater, sieve, imap and pop3. Ipurge starts from cron time to time. If cyrus goes in to the blocking state, and i manualy start ipurge i get message about how much messages will be deleted, how much scanned and etc. but process self never get to promt back. I understand that this description doesnt provide any usefull information, that will help identify problem. If i could identify it, i would already fix it probably. But its my last hope, maybe someone can point me whats wrong ? -- Teresa Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
Henrique de Moraes Holschuh wrote: On Mon, 15 Dec 2008, Teresa wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Which kernel? If it is Linux 2.6.27.8 or 2.6.27.9, try downgrading... What is wrong with those kernels? Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
On Mon, 15 Dec 2008, Teresa wrote: since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Which kernel? If it is Linux 2.6.27.8 or 2.6.27.9, try downgrading... -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus Deadblocking
since yesterday i have strange behavier of my productive mail server, and i cannt find the reason for 2 days allready. Does dmesg show anything odd? I didnt changed anything lately, but yesterday my cyrus starts rise cpu load up to 100% and after some time it stop responding. Mostly its a lmtp process, but it happends to pop3 also, or to imapd process self. What helps - restart. There is nothing in the log what would show the problem. If you attach to a hung process with strace -p {pid} what does it look like? All sendmail processes, as they use smmapd for local delivery are blocked also. Ca. 2 weeks ago i updated glibc to 2.9 version. But it worked this two weeks fine. Did you restart the services after the update? I am on gentoo box. Oh. [ebuild R ] sys-libs/db-4.6.21_p3-r1 USE=-bootstrap -doc -java -nocxx -tcl -test 0 kB [ebuild R ] sys-libs/glibc-2.9_p20081201 USE=gd (multilib) nls -debug -glibc-compat20 -glibc-omitfp (-hardened) -profile (-selinux) -vanilla 0 kB [ebuild R ] net-mail/cyrus-imapd-2.3.13 USE=idled pam ssl tcpd (-drac) -kerberos -kolab -nntp -replication -snmp 0 kB I assume the above is some Gentoo thing. I use squater, sieve, imap and pop3. Ipurge starts from cron time to time. If cyrus goes in to the blocking state, Sounds to be like Cyrus is not the only thing getting hung, which indicates the problem probably lies elsewhere. and i manualy start ipurge i get message about how much messages will be deleted, how much scanned and etc. but process self never get to promt back. I understand that this description doesnt provide any usefull information, that will help identify problem. If i could identify it, i would already fix it probably. But its my last hope, maybe someone can point me whats wrong ? Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html