-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Am 13.09.2015 um 02:38 schrieb Mark Triggs:
> Dear Maintainer, > > Recently I've noticed that running 'fetchmail' to awake my > background fetchmail daemon is quite slow. It takes around 5 > seconds to return, where previously it was pretty much > instantaneous. Hi Mark, Nico forwarded your input to me as the upstream maintainer, and asking my input. > I ran fetchmail under callgrind and found that it was spending all > of its time loading my (large) .fetchids file into a linked list. > I've got a POP3 server with all messages kept on it, and my > .fetchids file contains around 50,000 entries (I should probably > switch to IMAP ;) Well, no...t yet. fetchmail does not yet implement client-side tracking of "seen" messages in IMAP. (There have been incomplete patches offered in the past but those neglected looking at the UIDVALIDITY and were thus unsuitable for deployment.) >> From what I can see, the issue is that each ID from .fetchids is > appended to a linked list (by initialize_saved_lists() in uid.c), > but the append operation (save_str) has to scan the entire linked > list from head to tail on every append operation. As a result, the > process of reading the file is O(N^2) instead of O(N). > > There's a struct member on 'struct query' called 'oldsavedend' > that was unused and seemed like it might have been intended to > track the end of the linked list, so I've modified uid.c to use > that variable--keeping track of the last node in the linked list > and avoiding the full scan. I've attached a patch for that, and > performance is back to instantaneous in my testing. > > It's a pretty minor thing, but easy enough to fix so I thought I'd > report it. Your assessment is correct, and I have been considering such a change as you are proposing, but rejected it. Ultimately, unless fetchmail changes data structures, it cannot get any better than O(n^2) because n times the linear search of a linear list with n elements remains in the code path, and bogs the entire UID handling still down to O(n^2). I have received and integrated a contribution by Rainer Weikusat into the upstream's Git master branch, and also let it afoot with 7.0.0 alpha releases since alpha2, which switches the entire list handling to use Patricia trees (radix trees) instead, which should AFAICT bring the entire UID handling to sane levels, O(n log n). For reference, the relevant places to look at are, in the upstream Git repository, in the uid_db.c and uid_db.h files. 1. 3e0432c01c37e7cd4f059be7dfd1a1ca2286683b - merge of feature branch (unsuitable for cherry-picking or thereabouts, and I don't mean to backport it because we need to forward with TLS, too) 2. e3b838cb8db9a4cb2f1449d5535a918282e6855d - fixes a regression in 3e0432c affecting POP3 polls. The Git repository resides at these places currently: https://gitlab.com/fetchmail/fetchmail http://sourceforge.net/p/fetchmail/git/ci/master/tree/ Hope that helps. Nico, I propose: confirmed upstream fixed-upstream. Best, Matthias -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJWAyr1AAoJEOQSsVbv84Va1LoP/3wgA6WoBWcwJihYZxS0pESO Fbj8/tL13ziY/DfRaAb8hZa1ClWFHYaMwozHLB5ZDU6w/WQbouna9eCtnA4ztFO+ iTSx6KsuHtgpI7rmDVZK7fH5qf8Kgx/SmTQnJkIffNccQkqIUw+zbRVfbXZYusMj JewUhuI0nH3hLqXuWfWiHXOv1I2cEybYKsFuQcgdDx2lS1gaW76k0IqJ17tQx9Cq 00+PZQD3Q02jVAjcB5oJ7GRlXvh5relhtAo8jui2JTnfdTl0zfu0O3/L5fCt5jRX 7+Eqvkd3HNkcWH4jXwW63JT8G/US79Maj94h4AmKy8238eW3J/BHQ0jpxyPPOr2I BHJLj1jyWR77UGsU5kz12tLQZB4A3o08gj+IVRnZyyBJIaqyiA+LJ4enEtFYt3BI oc5vuFbnimmBIeEfOEQ0LIIr8/QAl1Q25mAB5nYoEGq8KizdUFfAQ1OIxa493z7j 3P/H+qTiTk5a1KfOuDw7uqwOU6MdBJcJgPDyUGNeVNgJLQLIcnv+ur8jpX0SQXWj LmReSsyMHBZwKvR0B/WLtH/IL19m4nmwPsNL+Ks1sI8dltCn8Jz+CeoXMb2l8/IC VBNrE8XIgRt3ownkxIwVQbB8/n4STyL5kxR342wNL3oHUP+fMDd4k+Coz3Dwxi3G LXKyfuU3psgKawwvOppJ =I+7o -----END PGP SIGNATURE-----