On Sun 2019-10-06 14:18:16 -0400, Daniel Kahn Gillmor wrote: > On Sat 2019-10-05 10:21:05 -0700, Sean Whitton wrote: > >> As an alternative to adding the integration tests, how about you use >> imap-dl on a daily basis for ~3 months with (I assume) a standard IMAP >> server, and if you don't have to make any nontrivial changes to the >> script during that time, we can ship it in mailscripts? > > i'm using imap-dl on a daily basis now, and will happily report back > then too. the last changes i made were supplying more debugging details > on october 2nd, so i suppose the new year is ~3 months if you want to > start the clock from my last changes :)
It's now been three months, and the only changes i've made to imap-dl have been trivial ones: - accept that some IMAP daemons will lie about message sizes before download, offer workaround for users stuck with those servers (thanks, bremner) - some grammar cleanup (thanks, Clint) - auto-creating the target maildir if it wasn't already present (thanks, jrollins) - produce sensible --help output (thanks, jrollins) - clean up internal typechecking - add tab completion in bash The whole series is available on the "imap-dl" branch at https://salsa.debian.org/dkg/mailscripts.git, or if you prefer to import it a single patch, i'm attaching that here. Thanks for maintaining mailscripts, --dkg
diff --git a/Makefile b/Makefile index af30616..ec3d851 100644 --- a/Makefile +++ b/Makefile @@ -1,15 +1,17 @@ MANPAGES=mdmv.1 mbox2maildir.1 \ notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \ + imap-dl.1 \ email-extract-openpgp-certs.1 \ email-print-mime-structure.1 \ notmuch-import-patch.1 -COMPLETIONS=completions/bash/email-print-mime-structure +COMPLETIONS=completions/bash/email-print-mime-structure completions/bash/imap-dl all: $(MANPAGES) $(COMPLETIONS) check: ./tests/email-print-mime-structure.sh mypy --strict ./email-print-mime-structure + mypy --strict ./imap-dl clean: rm -f $(MANPAGES) diff --git a/debian/control b/debian/control index bc8268a..21afa45 100644 --- a/debian/control +++ b/debian/control @@ -77,3 +77,5 @@ Description: collection of scripts for manipulating e-mail on Debian email-print-mime-structure -- tree view of a message's MIME structure . email-extract-openpgp-certs -- extract OpenPGP certificates from a message + . + imap-dl -- download messages from an IMAP mailbox to a maildir diff --git a/debian/mailscripts.bash-completion b/debian/mailscripts.bash-completion index 435576f..657de01 100644 --- a/debian/mailscripts.bash-completion +++ b/debian/mailscripts.bash-completion @@ -1 +1,2 @@ completions/bash/email-print-mime-structure +completions/bash/imap-dl diff --git a/debian/mailscripts.install b/debian/mailscripts.install index 2c060df..3739c49 100644 --- a/debian/mailscripts.install +++ b/debian/mailscripts.install @@ -1,5 +1,6 @@ email-extract-openpgp-certs /usr/bin email-print-mime-structure /usr/bin +imap-dl /usr/bin maildir-import-patch /usr/bin mbox2maildir /usr/bin mdmv /usr/bin diff --git a/debian/mailscripts.manpages b/debian/mailscripts.manpages index 1de088f..a915617 100644 --- a/debian/mailscripts.manpages +++ b/debian/mailscripts.manpages @@ -1,5 +1,6 @@ email-extract-openpgp-certs.1 email-print-mime-structure.1 +imap-dl.1 maildir-import-patch.1 mbox2maildir.1 mdmv.1 diff --git a/imap-dl b/imap-dl new file mode 100755 index 0000000..f5d7a85 --- /dev/null +++ b/imap-dl @@ -0,0 +1,254 @@ +#!/usr/bin/python3 +# PYTHON_ARGCOMPLETE_OK +# -*- coding: utf-8 -*- + +# Copyright (C) 2019 Daniel Kahn Gillmor +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or (at +# your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <https://www.gnu.org/licenses/>. + +DESCRIPTION = '''A simple replacement for a minimalist use of getmail. + +In particular, if you use getmail to reach an IMAP server as though it +were POP (retrieving from the server and optionally deleting), you can +point this script to the getmail config and it should do the same +thing. + +It tries to ensure that the configuration file is of the expected +type, and will terminate raising an exception, and it should not lose +messages. + +If there's any interest in supporting other use cases for getmail, +patches are welcome. + +If you've never used getmail, you can make the simplest possible +config file like so: + +---------- +[retriever] +server = mail.example.net +username = foo +password = sekr1t! + +[destination] +path = /home/foo/Maildir + +[options] +delete = True +---------- +''' + +import re +import sys +import ssl +import time +import imaplib +import logging +import mailbox +import os.path +import argparse +import statistics +import configparser + +from typing import Dict, List + +try: + import argcomplete #type: ignore +except ImportError: + argcomplete = None + +_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) RFC822.SIZE (?P<size>[0-9]+)\)$') +def break_fetch_summary(line:bytes) -> Dict[str,int]: + '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}''' + match = _summary_splitter.match(line) + if not match: + raise Exception(f'malformed summary line {line!r}') + ret:Dict[str,int] = {} + i:str + for i in ['id', 'uid', 'size']: + ret[i] = int(match[i]) + return ret + +_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$') +def break_fetch(line:bytes) -> Dict[str,int]: + '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}''' + match = _fetch_splitter.match(line) + if not match: + raise Exception(f'malformed fetch line {line!r}') + ret:Dict[str,int] = {} + i:str + for i in ['id', 'uid', 'size']: + ret[i] = int(match[i]) + return ret + +def pull_msgs(configfile:str, verbose:bool) -> None: + conf = configparser.ConfigParser() + conf.read_file(open(configfile, 'r')) + oldloglevel = logging.getLogger().getEffectiveLevel() + conf_verbose = conf.getint('options', 'verbose', fallback=1) + if conf_verbose > 1: + logging.getLogger().setLevel(logging.INFO) + logging.info('pulling from config file %s', configfile) + delete = conf.getboolean('options', 'delete', fallback=False) + read_all = conf.getboolean('options', 'read_all', fallback=True) + if not read_all: + raise NotImplementedError('imap-dl only supports options.read_all=True, got False') + rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever') + if rtype.lower() != 'simpleimapsslretriever': + raise NotImplementedError('imap-dl only supports retriever.type=SimpleIMAPSSLRetriever, got %s'%(rtype,)) + # FIXME: handle `retriever.record_mailbox` + dtype = conf.get('destination', 'type', fallback='Maildir') + if dtype.lower() != 'maildir': + raise NotImplementedError('imap-dl only supports destination.type=Maildir, got %s'%(dtype,)) + dst = conf.get('destination', 'path') + dst = os.path.expanduser(dst) + if os.path.exists(dst) and not os.path.isdir(dst): + raise Exception('expected destination directory, but %s is not a directory'%(dst,)) + mdst = mailbox.Maildir(dst, create=True) + ca_certs = conf.get('retriever', 'ca_certs', fallback=None) + on_size_mismatch = conf.get('options', 'on_size_mismatch', fallback='exception').lower() + sizes_mismatched:List[int] = [] + ctx = ssl.create_default_context(cafile=ca_certs) + with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'), #type: ignore + port=int(conf.get('retriever', 'port', fallback=993)), + ssl_context=ctx) as imap: + logging.info("Logging in as %s", conf.get('retriever', 'username')) + resp = imap.login(conf.get('retriever', 'username'), + conf.get('retriever', 'password')) + if resp[0] != 'OK': + raise Exception('login failed with %s as user %s on %s'%( + resp, + conf.get('retriever', 'username'), + conf.get('retriever', 'server'))) + if verbose: # only enable debugging after login to avoid leaking credentials in the log + imap.debug = 4 + logging.info("capabilities reported: %s", ', '.join(imap.capabilities)) + resp = imap.select(readonly=not delete) + if resp[0] != 'OK': + raise Exception('selection failed: %s'%(resp,)) + if len(resp[1]) != 1: + raise Exception('expected exactly one EXISTS response from select, got %s'%(resp[1])) + n = int(resp[1][0]) + if n == 0: + logging.info('No messages to retrieve') + logging.getLogger().setLevel(oldloglevel) + return + resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)') + if resp[0] != 'OK': + raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp)) + pending = list(map(break_fetch_summary, resp[1])) + sizes:Dict[int,int] = {} + for m in pending: + sizes[m['uid']] = m['size'] + fetched:Dict[int,int] = {} + uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, key=lambda x: x['uid']))) + totalbytes = sum([x['size'] for x in pending]) + logging.info('Fetching %d messages, expecting %d bytes of message content', + len(pending), totalbytes) + # FIXME: sort by size? + # FIXME: fetch in batches or singly instead of all-at-once? + # FIXME: rolling deletion? + # FIXME: asynchronous work? + before = time.perf_counter() + resp = imap.uid('FETCH', uids, '(UID BODY.PEEK[])') + after = time.perf_counter() + if resp[0] != 'OK': + raise Exception('UID fetch failed %s'%(resp[0])) + for f in resp[1]: + # these objects are weirdly structured. i don't know why + # these trailing close-parens show up. so this is very + # ad-hoc and nonsense + if isinstance(f, bytes): + if f != b')': + raise Exception('got bytes object of length %d but expected simple closeparen'%(len(f),)) + elif isinstance(f, tuple): + if len(f) != 2: + raise Exception('expected 2-part tuple, got %d-part'%(len(f),)) + m = break_fetch(f[0]) + if m['size'] != len(f[1]): + raise Exception('expected %d octets, got %d'%( + m['size'], len(f[1]))) + if m['size'] != sizes[m['uid']]: + if on_size_mismatch == 'warn': + if len(sizes_mismatched) == 0: + logging.warning('size mismatch: summary said %d octets, fetch sent %d', + sizes[m['uid']], m['size']) + elif len(sizes_mismatched) == 1: + logging.warning('size mismatch: (mismatches after the first suppressed until summary)') + sizes_mismatched.append(sizes[m['uid']] - m['size']) + elif on_size_mismatch == 'exception': + raise Exception('size mismatch: summary said %d octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid hard failure)', + sizes[m['uid']], m['size']) + elif on_size_mismatch != 'none': + raise Exception('size_mismatch: options.on_size_mismatch should be none, warn, or exception (found "%s")', on_size_mismatch) + fname = mdst.add(f[1].replace(b'\r\n', b'\n')) + logging.info('stored message %d/%d (uid %d, %d bytes) in %s', + len(fetched) + 1, len(pending), m['uid'], m['size'], fname) + del sizes[m['uid']] + fetched[m['uid']] = m['size'] + if sizes: + logging.warning('unhandled UIDs: %s', sizes) + logging.info('%d bytes of %d messages fetched in %g seconds (~%g KB/s)', + sum(fetched.values()), len(fetched), after - before, + sum(fetched.values())/((after - before)*1024)) + if on_size_mismatch == 'warn' and len(sizes_mismatched) > 1: + logging.warning('%d size mismatches out of %d messages (mismatches in bytes: mean %f, stddev %f)', + len(sizes_mismatched), len(fetched), + statistics.mean(sizes_mismatched), + statistics.stdev(sizes_mismatched)) + if delete: + logging.info('trying to delete %d messages from IMAP store', len(fetched)) + resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), '+FLAGS', r'(\Deleted)') + if resp[0] != 'OK': + raise Exception('failed to set \\Deleted flag: %s'%(resp)) + resp = imap.expunge() + if resp[0] != 'OK': + raise Exception('failed to expunge! %s'%(resp)) + else: + logging.info('not deleting any messages, since options.delete is not set') + logging.getLogger().setLevel(oldloglevel) + +if __name__ == '__main__': + parser = argparse.ArgumentParser( + description=DESCRIPTION, + formatter_class=argparse.RawDescriptionHelpFormatter, + ) + parser.add_argument( + 'config', nargs='+', metavar='CONFIG', + help="configuration file") + parser.add_argument( + '-v', '--verbose', action='store_true', + help="verbose log output") + + if argcomplete: + argcomplete.autocomplete(parser) + elif '_ARGCOMPLETE' in os.environ: + logging.error('Argument completion requested but the "argcomplete" ' + 'module is not installed. ' + 'Maybe you want to "apt install python3-argcomplete"') + sys.exit(1) + + args = parser.parse_args() + + if args.verbose: + logging.getLogger().setLevel(logging.INFO) + + errs = {} + for confname in args.config: + try: + pull_msgs(confname, args.verbose) + except imaplib.IMAP4.error as e: + logging.error('IMAP failure for config file %s: %s', confname, e) + errs[confname] = e + if errs: + exit(1) diff --git a/imap-dl.1.pod b/imap-dl.1.pod new file mode 100644 index 0000000..1407d05 --- /dev/null +++ b/imap-dl.1.pod @@ -0,0 +1,88 @@ +=encoding utf8 + +=head1 NAME + +imap-dl -- a simple replacement for a minimalist user of getmail + +=head1 SYNOPSIS + +B<imap-dl> [B<-v>|B<--verbose>] B<configfile>... + +=head1 DESCRIPTION + +If you use getmail to reach an IMAP server as though it were POP +(retrieving from the server, storing it in a maildir and optionally +deleting), you can point this script to the getmail config and it +should do the same thing. + +It tries to ensure that the configuration file is of the expected +type, and will terminate raising an exception, and it should not lose +messages. + +If there's any interest in supporting other use cases for getmail, +patches are welcome. + +=head1 OPTIONS + +B<-v> or B<--verbose> causes B<imap-dl> to print more details +about what it is doing. + +In addition to parts of the standard B<getmail> configuration, +B<imap-dl> supports the following keywords in the config file: + +B<options.on_size_mismatch> can be set to B<exception>, B<none>, or +B<warn>. This governs what to do when the remote IMAP server claims a +different size in the message summary list than the actual message +retrieval (default: B<exception>). + +=head1 EXAMPLE CONFIG + +If you've never used getmail, you can make the simplest possible +config file like so: + +=over 4 + + [retriever] + server = mail.example.net + username = foo + password = sekr1t! + + [destination] + path = /home/foo/Maildir + + [options] + delete = True + +=back + +=head1 LIMITATIONS + +B<imap-dl> is currently deliberately minimal. It is designed to be +used by someone who treats their IMAP mailbox like a POP server. + +It works with IMAP-over-TLS only, and it just fetches all messages +from the default IMAP folder. It does not support all the various +features of getmail. + +B<imap-dl> is deliberately implemented in a modern version of python3, +and tries to just use the standard library. It will not be backported +to python2. + +B<imap-dl> uses imaplib, which means that it does synchronous calls to +the imap server. A more clever implementation would use asynchronous +python to avoid latency/roundtrips. + +B<imap-dl> does not know how to wait and listen for new mail using +IMAP IDLE. This would be a nice additional feature. + +B<imap-dl> does not yet know how to deliver to an MDA (or to +B<notmuch-insert>). This would be a nice thing to be able to do. + +=head1 SEE ALSO + +https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/ + +=head1 AUTHOR + +B<imap-dl> and this manpage were written by Daniel Kahn Gillmor, +inspired by some functionality from the getmail project.
signature.asc
Description: PGP signature