Bug#596291: offlineimap getting stuck when run with multiple threads/connections
Hi, A couple of final notes. I get this hang even when running with -1, and I only get this hang when on the same LAN as the IMAP server. When fetching remotely, it's fine. Again, this points to a timing dependant bug that is only tickled when the sync isn't being slowed down by latency or something else. Sorry I'm not being more helpful. Cheers, -- - | ,''`.Stephen Gran | | : :' :sg...@debian.org | | `. `'Debian user, admin, and developer | |`- http://www.debian.org | - signature.asc Description: Digital signature
Bug#596291: offlineimap getting stuck when run with multiple threads/connections
This one time, at band camp, Stephen Gran said: I have so far been unable to reproduce this when running offlineimap under strace, which suggests a timing dependant bug (kind of as suspected, but still not all that helpful, I realize). The earlier strace (and the strace I see when attaching after it hangs) does suggest that a lock is not being releaesed somewhere, but I can't immediately see anything obviously wrong with the logic in the code. Trying to run with -d thread under python2.6 gets me a recursion exceeded exception, but I did manage to capture a lockup under 2.5: Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c925d3210.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: InstanceLimitedThread(Copy message 2200 from packaging.clamav, initial daemon).start(): starting thread Copy message 2200 from packaging.clamav: InstanceLimitedThread(Copy message 2200 from packaging.clamav, started daemon).__bootstrap(): thread started Copy message 2200 from packaging.clamav: Copy message 2200 IMAP[packaging.clamav] - Maildir[packaging.clamav], LocalStatus[packaging.clamav] Copy message 2200 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire(1): blocked waiting, value=0 Copy message 2200 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c91080e10.release: success, value=1 Copy message 2200 from packaging.clamav: Condition(thread.lock object at 0x7f0c9d86f288, 1).notify(): notifying 1 waiter Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f288, 0).wait(): got it Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.release: success, value=1 Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f288, 0).notify(): no waiters Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c925d3210.acquire(1): blocked waiting, value=0 Copy message 2200 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c925d3210.release: success, value=1 Copy message 2200 from packaging.clamav: Condition(thread.lock object at 0x7f0c9d86f1e0, 1).notify(): notifying 1 waiter Copy message 2200 from packaging.clamav: InstanceLimitedThread(Copy message 2200 from packaging.clamav, started daemon).__bootstrap(): normal return Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f1e0, 0).wait(): got it Copy message 2200 from packaging.clamav: Condition(thread.lock object at 0x7a81b40, 0).notify(): no waiters Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c925d3210.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: InstanceLimitedThread(Copy message 2201 from packaging.clamav, initial daemon).start(): starting thread Copy message 2201 from packaging.clamav: InstanceLimitedThread(Copy message 2201 from packaging.clamav, started daemon).__bootstrap(): thread started Copy message 2201 from packaging.clamav: Copy message 2201 IMAP[packaging.clamav] - Maildir[packaging.clamav], LocalStatus[packaging.clamav] Copy message 2201 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire(1): blocked waiting, value=0 Copy message 2201 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c91080e10.release: success, value=1 Copy message 2201 from packaging.clamav: Condition(thread.lock object at 0x7f0c9d86f288, 1).notify(): notifying 1 waiter Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f288, 0).wait(): got it Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.acquire: success, value=0 Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c91080e10.release: success, value=1 Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f288, 0).notify(): no waiters Folder sync Lobefin[packaging.clamav]: threading._BoundedSemaphore object at 0x7f0c925d3210.acquire(1): blocked waiting, value=0 Copy message 2201 from packaging.clamav: threading._BoundedSemaphore object at 0x7f0c925d3210.release: success, value=1 Copy message 2201 from packaging.clamav: Condition(thread.lock object at 0x7f0c9d86f1e0, 1).notify(): notifying 1 waiter Copy message 2201 from packaging.clamav: InstanceLimitedThread(Copy message 2201 from packaging.clamav, started daemon).__bootstrap(): normal return Folder sync Lobefin[packaging.clamav]: Condition(thread.lock object at 0x7f0c9d86f1e0, 0).wait(): got it Copy message 2201 from
Bug#596291: offlineimap getting stuck when run with multiple threads/connections
This one time, at band camp, Rogério Brito said: As I reported in another bug (about offlineimap dying too easily under many circumstances), attached is what I see when I strace one of the running threads of offlineimap when it is run with 10 connections to initially mirror my gmail account to a local maildir. I am not trying anymore to push the messages to the gmail server---I am now trying to pull a fresh copy of them from there. If there is anything that would be desired and if I can reproduce it, then, please, let me know. Hi, Just a me too report. I'm seeing it die after fetching a few thousand mails here, although the time to hang is variable. I've tried running it against both the lenny dovecot and the current bpo dovecot, and it's the same. I've also tried with both python 2.6 and 2.5 to see if the recent python transition is what's broken it, but it also hangs with 2.5. Below is the end of the debug imap output. Trying to run debug thread (which is where it seems to actually hang) throws a recursion exceeded exception, which is slightly unhelpful :) Thanks for looking at this. Copy message 3166 from lists.clamav-users: Copy message 3166 IMAP[lists.clamav-users] - Maildir[lists.clamav-users], LocalStatus[lists.clamav-users] Copy message 3166 from lists.clamav-users: DEBUG[imap]: 49:23.94 GAHI1199 UID FETCH 3166 (BODY.PEEK[]) The server is just sat in: epoll_wait(5, {}, 4, 5000) = 0 So it looks like they've somehow got out of sync with each other. I'm assuming the server thinks it's answered the last request, and offlineimap thinks it's made the last request, so they're deadlocked waiting for the other side to do something. Cheers, -- - | ,''`.Stephen Gran | | : :' :sg...@debian.org | | `. `'Debian user, admin, and developer | |`- http://www.debian.org | - signature.asc Description: Digital signature
Bug#596291: offlineimap getting stuck when run with multiple threads/connections
This one time, at band camp, Stephen Gran said: Copy message 3166 from lists.clamav-users: Copy message 3166 IMAP[lists.clamav-users] - Maildir[lists.clamav-users], LocalStatus[lists.clamav-users] Copy message 3166 from lists.clamav-users: DEBUG[imap]: 49:23.94 GAHI1199 UID FETCH 3166 (BODY.PEEK[]) The server is just sat in: epoll_wait(5, {}, 4, 5000) = 0 I've managed to correlate the server side and the client side a bit more: Copy message 2099 from debian.amd64.2007: Copy message 2099 IMAP[debian.amd64.2007] - Maildir[debian.amd64.2007], LocalStatus[debian.amd64.2007] Copy message 2100 from debian.amd64.2007: Copy message 2100 IMAP[debian.amd64.2007] - Maildir[debian.amd64.2007], LocalStatus[debian.amd64.2007] But the server sees: [pid 12586] read(0, BHAJ7358 UID FETCH 2099 (BODY.PEEK[])\r\n..., 3029) = 39 [pid 12586] write(1, * 2099 FETCH (UID 2099 BODY[] {5143}\r\nReturn-path: , 4094) = 4094 [pid 12586] write(1, BHAJ7358 OK Fetch completed.\r\n..., 1120) = 1120 [pid 12586] epoll_wait(5, unfinished ... [pid 12568] ... epoll_wait resumed {}, 14, 1000) = 0 [pid 12568] epoll_wait(10, unfinished ... [pid 12586] ... epoll_wait resumed {}, 3, 517) = 0 [pid 12586] epoll_wait(5, unfinished ... [pid 12568] ... epoll_wait resumed {}, 14, 1000) = 0 and there it hangs. So it looks like the client thinks it's sent a FETCH, but the server never sees it. Hope this is helpful. I have so far been unable to reproduce this when running offlineimap under strace, which suggests a timing dependant bug (kind of as suspected, but still not all that helpful, I realize). The earlier strace (and the strace I see when attaching after it hangs) does suggest that a lock is not being releaesed somewhere, but I can't immediately see anything obviously wrong with the logic in the code. Cheers, -- - | ,''`.Stephen Gran | | : :' :sg...@debian.org | | `. `'Debian user, admin, and developer | |`- http://www.debian.org | - signature.asc Description: Digital signature
Bug#596291: offlineimap getting stuck when run with multiple threads/connections
Package: offlineimap Severity: important As I reported in another bug (about offlineimap dying too easily under many circumstances), attached is what I see when I strace one of the running threads of offlineimap when it is run with 10 connections to initially mirror my gmail account to a local maildir. I am not trying anymore to push the messages to the gmail server---I am now trying to pull a fresh copy of them from there. If there is anything that would be desired and if I can reproduce it, then, please, let me know. Regards, -- Rogério Brito : rbr...@{ime.usp.br,gmail.com} : GPG key 1024D/7C2CAEB8 http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa2aa1b8, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa2aa1b8, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa2aa1b8, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 read(12, , 5) = 0 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa2aa1b8, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa317238, FUTEX_WAIT_PRIVATE, 0, NULL) = 0