Thanks for this, David. On Mon 2019-09-16 10:04:10 -0300, David Bremner wrote: > It looks like every message; see attached log. I haven't had a chance to > try the patched imap-dl
It looks lke your server is indeed lying about the message size in the initial summary. It clearly says 41997 octets here: > command FETCH ('1:1', '(UID RFC822.SIZE)') response ['1 (UID 94515 > RFC822.SIZE 41997)'] > _parse_imapattrresponse() [_retrieverbases.py:1334] parsing attributes > response line 1 (UID 94515 RFC822.SIZE 41997) > _parse_imapattrresponse() [_retrieverbases.py:1357] got {'rfc822.size': > '41997', 'uid': '94515'} and then when you go to retrieve it, it gives you only 6294 octets: > retrieving body for message "94515" > _parse_imapuidcmdresponse() [_retrieverbases.py:1314] trace > command uid FETCH ('94515', '(BODY.PEEK[])') response [('1 (BODY[] {6294}', but then getmail claims to have delivered 41997 octets: > msg 1/1 (41997 bytes) delivered, deleted > 1 messages (41997 bytes) retrieved, 0 skipped This is all very weird. Can you verify the size of the actual message as delivered? stat "$(notmuch search --output=files id:87sgowo0w8....@tethera.net)" If this is the case, and your server lies, and getmail is just confused, perhaps we need to report a bug to getmail. But more to the point here, if you want to use imap-dl i suppose we have a few options: a) i can make imap-dl not care about this confirmation check at all (or maybe just warn instead of throw an exception) b) i can make imap-dl avoid this checking based on option in the config file (options.ignore_size_mismatch). This makes the config file diverge a bit from getmail, but it looks like getmail is happy to ignore the extra config var. I'm leaning toward (b) -- would you be willing to set that flag in your config file? You might be wondering why imap-dl cares, given that getmail seems to ignore the mismatch. At the moment, the size mismatch check doesn't do anything functional. The reason i have it in place is that i imagine that in future versions, i'd want to try to pull down messages in batches, so that we don't have to worry about buffering all the messages in RAM before starting to save them to disk. getmail's behavior is to pull messages one at a time, but this approach adds at least one additional roundtrip to the server between messages. If you're pulling 60 messages totalling 500KiB over a 1Mbps connection, then the time to pull the messages due to overall bandwidth constraints is ~4s. If round-trip latency to the server is 85ms, adding 1RTT per message induces an extra 5s lag -- doubling the time it takes to fetch the mailbox. Yuck! So we don't want to pull the messages one at a time. If we could rely on the message sizes reported from the server, then we could retrieve messages in tranches of "reasonable" size -- enough to not worry about RAM exhaustion, and to also be able to deliver (and if configured, delete) some messages even if the whole list hasn't been retrieved and deleted yet. I was thinking that a reasonable "tranche size" would be something like 5MiB or 10MiB. You'd build tranches based on the summary, starting with the first unfetched message, then considering each subsequent message until it would total more than the tranche size. Then for each tranche, retrieve the messages from that tranche, store them locally, and expunge them if configured. Then on to the next tranche until you've exhausted the summary. But if the server lies about message size, then i don't really know how to calculate tranche size realistically. I suppose if the user has specified ignore_size_mismatch, we can do whatever we want for tranche sizes, but that makes me kind of sad. If i had this tranching mechanism in place, what would you want imap-dl to do when talking to such a lying server? --dkg
signature.asc
Description: PGP signature