Thanks for this, David.

On Mon 2019-09-16 10:04:10 -0300, David Bremner wrote:
> It looks like every message; see attached log. I haven't had a chance to
> try the patched imap-dl

It looks lke your server is indeed lying about the message size in the
initial summary.

It clearly says 41997 octets here:

> command FETCH ('1:1', '(UID RFC822.SIZE)') response ['1 (UID 94515 
> RFC822.SIZE 41997)']
> _parse_imapattrresponse() [_retrieverbases.py:1334] parsing attributes 
> response line 1 (UID 94515 RFC822.SIZE 41997)
> _parse_imapattrresponse() [_retrieverbases.py:1357] got {'rfc822.size': 
> '41997', 'uid': '94515'}

and then when you go to retrieve it, it gives you only 6294 octets:

> retrieving body for message "94515"
> _parse_imapuidcmdresponse() [_retrieverbases.py:1314] trace
> command uid FETCH ('94515', '(BODY.PEEK[])') response [('1 (BODY[] {6294}',

but then getmail claims to have delivered 41997 octets:

>  msg 1/1 (41997 bytes) delivered, deleted
>  1 messages (41997 bytes) retrieved, 0 skipped

This is all very weird.

Can you verify the size of the actual message as delivered?

    stat "$(notmuch search --output=files id:87sgowo0w8....@tethera.net)"

If this is the case, and your server lies, and getmail is just confused,
perhaps we need to report a bug to getmail.

But more to the point here, if you want to use imap-dl i suppose we have
a few options:

 a) i can make imap-dl not care about this confirmation check at all (or
    maybe just warn instead of throw an exception)

 b) i can make imap-dl avoid this checking based on option in the config
    file (options.ignore_size_mismatch).  This makes the config file
    diverge a bit from getmail, but it looks like getmail is happy to
    ignore the extra config var.

I'm leaning toward (b) -- would you be willing to set that flag in your
config file?

You might be wondering why imap-dl cares, given that getmail seems to
ignore the mismatch.  At the moment, the size mismatch check doesn't do
anything functional.

The reason i have it in place is that i imagine that in future versions,
i'd want to try to pull down messages in batches, so that we don't have
to worry about buffering all the messages in RAM before starting to save
them to disk.

getmail's behavior is to pull messages one at a time, but this approach
adds at least one additional roundtrip to the server between messages.
If you're pulling 60 messages totalling 500KiB over a 1Mbps connection,
then the time to pull the messages due to overall bandwidth constraints
is ~4s. If round-trip latency to the server is 85ms, adding 1RTT per
message induces an extra 5s lag -- doubling the time it takes to fetch
the mailbox.  Yuck!  So we don't want to pull the messages one at a
time.

If we could rely on the message sizes reported from the server, then we
could retrieve messages in tranches of "reasonable" size -- enough to
not worry about RAM exhaustion, and to also be able to deliver (and if
configured, delete) some messages even if the whole list hasn't been
retrieved and deleted yet.  I was thinking that a reasonable "tranche
size" would be something like 5MiB or 10MiB.  You'd build tranches based
on the summary, starting with the first unfetched message, then
considering each subsequent message until it would total more than the
tranche size.  Then for each tranche, retrieve the messages from that
tranche, store them locally, and expunge them if configured.  Then on to
the next tranche until you've exhausted the summary.

But if the server lies about message size, then i don't really know how
to calculate tranche size realistically.  I suppose if the user has
specified ignore_size_mismatch, we can do whatever we want for tranche
sizes, but that makes me kind of sad.  If i had this tranching mechanism
in place, what would you want imap-dl to do when talking to such a lying
server?

        --dkg

Attachment: signature.asc
Description: PGP signature

Reply via email to