-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Am 13.09.2015 um 02:38 schrieb Mark Triggs:

> Dear Maintainer,
> 
> Recently I've noticed that running 'fetchmail' to awake my
> background fetchmail daemon is quite slow.  It takes around 5
> seconds to return, where previously it was pretty much
> instantaneous.

Hi Mark,

Nico forwarded your input to me as the upstream maintainer, and asking
my input.

> I ran fetchmail under callgrind and found that it was spending all
> of its time loading my (large) .fetchids file into a linked list.
> I've got a POP3 server with all messages kept on it, and my
> .fetchids file contains around 50,000 entries (I should probably
> switch to IMAP ;)

Well, no...t yet. fetchmail does not yet implement client-side
tracking of "seen" messages in IMAP.  (There have been incomplete
patches offered in the past but those neglected looking at the
UIDVALIDITY and were thus unsuitable for deployment.)

>> From what I can see, the issue is that each ID from .fetchids is
> appended to a linked list (by initialize_saved_lists() in uid.c),
> but the append operation (save_str) has to scan the entire linked
> list from head to tail on every append operation.  As a result, the
> process of reading the file is O(N^2) instead of O(N).
> 
> There's a struct member on 'struct query' called 'oldsavedend'
> that was unused and seemed like it might have been intended to
> track the end of the linked list, so I've modified uid.c to use
> that variable--keeping track of the last node in the linked list
> and avoiding the full scan.  I've attached a patch for that, and 
> performance is back to instantaneous in my testing.
> 
> It's a pretty minor thing, but easy enough to fix so I thought I'd 
> report it.

Your assessment is correct, and I have been considering such a change
as you are proposing, but rejected it.  Ultimately, unless fetchmail
changes data structures, it cannot get any better than O(n^2) because
n times the linear search of a linear list with n elements remains in
the code path, and bogs the entire UID handling still down to O(n^2).

I have received and integrated a contribution by Rainer Weikusat into
the upstream's Git master branch, and also let it afoot with 7.0.0
alpha releases since alpha2, which switches the entire list handling
to use Patricia trees (radix trees) instead, which should AFAICT bring
the entire UID handling to sane levels, O(n log n).

For reference, the relevant places to look at are, in the upstream Git
repository, in the uid_db.c and uid_db.h files.

1. 3e0432c01c37e7cd4f059be7dfd1a1ca2286683b - merge of feature branch
(unsuitable for cherry-picking or thereabouts, and I don't mean to
backport it because we need to forward with TLS, too)

2. e3b838cb8db9a4cb2f1449d5535a918282e6855d - fixes a regression in
3e0432c affecting POP3 polls.

The Git repository resides at these places currently:

https://gitlab.com/fetchmail/fetchmail
http://sourceforge.net/p/fetchmail/git/ci/master/tree/

Hope that helps.


Nico, I propose: confirmed upstream fixed-upstream.

Best,
Matthias
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJWAyr1AAoJEOQSsVbv84Va1LoP/3wgA6WoBWcwJihYZxS0pESO
Fbj8/tL13ziY/DfRaAb8hZa1ClWFHYaMwozHLB5ZDU6w/WQbouna9eCtnA4ztFO+
iTSx6KsuHtgpI7rmDVZK7fH5qf8Kgx/SmTQnJkIffNccQkqIUw+zbRVfbXZYusMj
JewUhuI0nH3hLqXuWfWiHXOv1I2cEybYKsFuQcgdDx2lS1gaW76k0IqJ17tQx9Cq
00+PZQD3Q02jVAjcB5oJ7GRlXvh5relhtAo8jui2JTnfdTl0zfu0O3/L5fCt5jRX
7+Eqvkd3HNkcWH4jXwW63JT8G/US79Maj94h4AmKy8238eW3J/BHQ0jpxyPPOr2I
BHJLj1jyWR77UGsU5kz12tLQZB4A3o08gj+IVRnZyyBJIaqyiA+LJ4enEtFYt3BI
oc5vuFbnimmBIeEfOEQ0LIIr8/QAl1Q25mAB5nYoEGq8KizdUFfAQ1OIxa493z7j
3P/H+qTiTk5a1KfOuDw7uqwOU6MdBJcJgPDyUGNeVNgJLQLIcnv+ur8jpX0SQXWj
LmReSsyMHBZwKvR0B/WLtH/IL19m4nmwPsNL+Ks1sI8dltCn8Jz+CeoXMb2l8/IC
VBNrE8XIgRt3ownkxIwVQbB8/n4STyL5kxR342wNL3oHUP+fMDd4k+Coz3Dwxi3G
LXKyfuU3psgKawwvOppJ
=I+7o
-----END PGP SIGNATURE-----

Reply via email to