On Sun 2019-10-06 14:18:16 -0400, Daniel Kahn Gillmor wrote:
> On Sat 2019-10-05 10:21:05 -0700, Sean Whitton wrote:
>
>> As an alternative to adding the integration tests, how about you use
>> imap-dl on a daily basis for ~3 months with (I assume) a standard IMAP
>> server, and if you don't have to make any nontrivial changes to the
>> script during that time, we can ship it in mailscripts?
>
> i'm using imap-dl on a daily basis now, and will happily report back
> then too.  the last changes i made were supplying more debugging details
> on october 2nd, so i suppose the new year is ~3 months if you want to
> start the clock from my last changes :)

It's now been three months, and the only changes i've made to imap-dl
have been trivial ones:

 - accept that some IMAP daemons will lie about message sizes before
   download, offer workaround for users stuck with those servers
   (thanks, bremner)
 - some grammar cleanup (thanks, Clint)
 - auto-creating the target maildir if it wasn't already present
   (thanks, jrollins)
 - produce sensible --help output
   (thanks, jrollins)
 - clean up internal typechecking
 - add tab completion in bash

The whole series is available on the "imap-dl" branch at
https://salsa.debian.org/dkg/mailscripts.git, or if you prefer to import
it a single patch, i'm attaching that here.

Thanks for maintaining mailscripts,

       --dkg

diff --git a/Makefile b/Makefile
index af30616..ec3d851 100644
--- a/Makefile
+++ b/Makefile
@@ -1,15 +1,17 @@
 MANPAGES=mdmv.1 mbox2maildir.1 \
 	notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \
+	imap-dl.1 \
 	email-extract-openpgp-certs.1 \
 	email-print-mime-structure.1 \
 	notmuch-import-patch.1
-COMPLETIONS=completions/bash/email-print-mime-structure
+COMPLETIONS=completions/bash/email-print-mime-structure completions/bash/imap-dl
 
 all: $(MANPAGES) $(COMPLETIONS)
 
 check:
 	./tests/email-print-mime-structure.sh
 	mypy --strict ./email-print-mime-structure
+	mypy --strict ./imap-dl
 
 clean:
 	rm -f $(MANPAGES)
diff --git a/debian/control b/debian/control
index bc8268a..21afa45 100644
--- a/debian/control
+++ b/debian/control
@@ -77,3 +77,5 @@ Description: collection of scripts for manipulating e-mail on Debian
  email-print-mime-structure -- tree view of a message's MIME structure
  .
  email-extract-openpgp-certs -- extract OpenPGP certificates from a message
+ .
+ imap-dl -- download messages from an IMAP mailbox to a maildir
diff --git a/debian/mailscripts.bash-completion b/debian/mailscripts.bash-completion
index 435576f..657de01 100644
--- a/debian/mailscripts.bash-completion
+++ b/debian/mailscripts.bash-completion
@@ -1 +1,2 @@
 completions/bash/email-print-mime-structure
+completions/bash/imap-dl
diff --git a/debian/mailscripts.install b/debian/mailscripts.install
index 2c060df..3739c49 100644
--- a/debian/mailscripts.install
+++ b/debian/mailscripts.install
@@ -1,5 +1,6 @@
 email-extract-openpgp-certs /usr/bin
 email-print-mime-structure /usr/bin
+imap-dl /usr/bin
 maildir-import-patch /usr/bin
 mbox2maildir /usr/bin
 mdmv /usr/bin
diff --git a/debian/mailscripts.manpages b/debian/mailscripts.manpages
index 1de088f..a915617 100644
--- a/debian/mailscripts.manpages
+++ b/debian/mailscripts.manpages
@@ -1,5 +1,6 @@
 email-extract-openpgp-certs.1
 email-print-mime-structure.1
+imap-dl.1
 maildir-import-patch.1
 mbox2maildir.1
 mdmv.1
diff --git a/imap-dl b/imap-dl
new file mode 100755
index 0000000..f5d7a85
--- /dev/null
+++ b/imap-dl
@@ -0,0 +1,254 @@
+#!/usr/bin/python3
+# PYTHON_ARGCOMPLETE_OK
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2019 Daniel Kahn Gillmor
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or (at
+# your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+DESCRIPTION = '''A simple replacement for a minimalist use of getmail.
+
+In particular, if you use getmail to reach an IMAP server as though it
+were POP (retrieving from the server and optionally deleting), you can
+point this script to the getmail config and it should do the same
+thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+----------
+[retriever]
+server = mail.example.net
+username = foo
+password = sekr1t!
+
+[destination]
+path = /home/foo/Maildir
+
+[options]
+delete = True
+----------
+'''
+
+import re
+import sys
+import ssl
+import time
+import imaplib
+import logging
+import mailbox
+import os.path
+import argparse
+import statistics
+import configparser
+
+from typing import Dict, List
+
+try:
+    import argcomplete #type: ignore
+except ImportError:
+    argcomplete = None
+
+_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) RFC822.SIZE (?P<size>[0-9]+)\)$')
+def break_fetch_summary(line:bytes) -> Dict[str,int]:
+    '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}'''
+    match = _summary_splitter.match(line)
+    if not match:
+        raise Exception(f'malformed summary line {line!r}')
+    ret:Dict[str,int] = {}
+    i:str
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$')
+def break_fetch(line:bytes) -> Dict[str,int]:
+    '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}'''
+    match = _fetch_splitter.match(line)
+    if not match:
+        raise Exception(f'malformed fetch line {line!r}')
+    ret:Dict[str,int] = {}
+    i:str
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+def pull_msgs(configfile:str, verbose:bool) -> None:
+    conf = configparser.ConfigParser()
+    conf.read_file(open(configfile, 'r'))
+    oldloglevel = logging.getLogger().getEffectiveLevel()
+    conf_verbose = conf.getint('options', 'verbose', fallback=1)
+    if conf_verbose > 1:
+        logging.getLogger().setLevel(logging.INFO)
+    logging.info('pulling from config file %s', configfile)
+    delete = conf.getboolean('options', 'delete', fallback=False)
+    read_all = conf.getboolean('options', 'read_all', fallback=True)
+    if not read_all:
+        raise NotImplementedError('imap-dl only supports options.read_all=True, got False')
+    rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever')
+    if rtype.lower() != 'simpleimapsslretriever':
+        raise NotImplementedError('imap-dl only supports retriever.type=SimpleIMAPSSLRetriever, got %s'%(rtype,))
+    # FIXME: handle `retriever.record_mailbox`
+    dtype = conf.get('destination', 'type', fallback='Maildir')
+    if dtype.lower() != 'maildir':
+        raise NotImplementedError('imap-dl only supports destination.type=Maildir, got %s'%(dtype,))
+    dst = conf.get('destination', 'path')
+    dst = os.path.expanduser(dst)
+    if os.path.exists(dst) and not os.path.isdir(dst):
+        raise Exception('expected destination directory, but %s is not a directory'%(dst,))
+    mdst = mailbox.Maildir(dst, create=True)
+    ca_certs = conf.get('retriever', 'ca_certs', fallback=None)
+    on_size_mismatch = conf.get('options', 'on_size_mismatch', fallback='exception').lower()
+    sizes_mismatched:List[int] = []
+    ctx = ssl.create_default_context(cafile=ca_certs)
+    with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'), #type: ignore
+                           port=int(conf.get('retriever', 'port', fallback=993)),
+                           ssl_context=ctx) as imap:
+        logging.info("Logging in as %s", conf.get('retriever', 'username'))
+        resp = imap.login(conf.get('retriever', 'username'),
+                          conf.get('retriever', 'password'))
+        if resp[0] != 'OK':
+            raise Exception('login failed with %s as user %s on %s'%(
+                resp,
+                conf.get('retriever', 'username'),
+                conf.get('retriever', 'server')))
+        if verbose: # only enable debugging after login to avoid leaking credentials in the log
+            imap.debug = 4
+            logging.info("capabilities reported: %s", ', '.join(imap.capabilities))
+        resp = imap.select(readonly=not delete)
+        if resp[0] != 'OK':
+            raise Exception('selection failed: %s'%(resp,))
+        if len(resp[1]) != 1:
+            raise Exception('expected exactly one EXISTS response from select, got %s'%(resp[1]))
+        n = int(resp[1][0])
+        if n == 0:
+            logging.info('No messages to retrieve')
+            logging.getLogger().setLevel(oldloglevel)
+            return
+        resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)')
+        if resp[0] != 'OK':
+            raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp))
+        pending = list(map(break_fetch_summary, resp[1]))
+        sizes:Dict[int,int] = {}
+        for m in pending:
+            sizes[m['uid']] = m['size']
+        fetched:Dict[int,int] = {}
+        uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, key=lambda x: x['uid'])))
+        totalbytes = sum([x['size'] for x in pending])
+        logging.info('Fetching %d messages, expecting %d bytes of message content',
+                     len(pending), totalbytes)
+        # FIXME: sort by size?
+        # FIXME: fetch in batches or singly instead of all-at-once?
+        # FIXME: rolling deletion?
+        # FIXME: asynchronous work?
+        before = time.perf_counter()
+        resp = imap.uid('FETCH', uids, '(UID BODY.PEEK[])')
+        after = time.perf_counter()
+        if resp[0] != 'OK':
+            raise Exception('UID fetch failed %s'%(resp[0]))
+        for f in resp[1]:
+            # these objects are weirdly structured. i don't know why
+            # these trailing close-parens show up.  so this is very
+            # ad-hoc and nonsense
+            if isinstance(f, bytes):
+                if f != b')':
+                    raise Exception('got bytes object of length %d but expected simple closeparen'%(len(f),))
+            elif isinstance(f, tuple):
+                if len(f) != 2:
+                    raise Exception('expected 2-part tuple, got %d-part'%(len(f),))
+                m = break_fetch(f[0])
+                if m['size'] != len(f[1]):
+                    raise Exception('expected %d octets, got %d'%(
+                        m['size'], len(f[1])))
+                if m['size'] != sizes[m['uid']]:
+                    if on_size_mismatch == 'warn':
+                        if len(sizes_mismatched) == 0:
+                            logging.warning('size mismatch: summary said %d octets, fetch sent %d',
+                                            sizes[m['uid']], m['size'])
+                        elif len(sizes_mismatched) == 1:
+                            logging.warning('size mismatch: (mismatches after the first suppressed until summary)')
+                        sizes_mismatched.append(sizes[m['uid']] - m['size'])
+                    elif on_size_mismatch == 'exception':
+                        raise Exception('size mismatch: summary said %d octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid hard failure)',
+                                        sizes[m['uid']], m['size'])
+                    elif on_size_mismatch != 'none':
+                        raise Exception('size_mismatch: options.on_size_mismatch should be none, warn, or exception (found "%s")', on_size_mismatch)
+                fname = mdst.add(f[1].replace(b'\r\n', b'\n'))
+                logging.info('stored message %d/%d (uid %d, %d bytes) in %s',
+                             len(fetched) + 1, len(pending), m['uid'], m['size'], fname)
+                del sizes[m['uid']]
+                fetched[m['uid']] = m['size']
+        if sizes:
+            logging.warning('unhandled UIDs: %s', sizes)
+        logging.info('%d bytes of %d messages fetched in %g seconds (~%g KB/s)',
+                     sum(fetched.values()), len(fetched), after - before,
+                     sum(fetched.values())/((after - before)*1024))
+        if on_size_mismatch == 'warn' and len(sizes_mismatched) > 1:
+            logging.warning('%d size mismatches out of %d messages (mismatches in bytes: mean %f, stddev %f)',
+                            len(sizes_mismatched), len(fetched),
+                            statistics.mean(sizes_mismatched),
+                            statistics.stdev(sizes_mismatched))
+        if delete:
+            logging.info('trying to delete %d messages from IMAP store', len(fetched))
+            resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), '+FLAGS', r'(\Deleted)')
+            if resp[0] != 'OK':
+                raise Exception('failed to set \\Deleted flag: %s'%(resp))
+            resp = imap.expunge()
+            if resp[0] != 'OK':
+                raise Exception('failed to expunge! %s'%(resp))
+        else:
+            logging.info('not deleting any messages, since options.delete is not set')
+        logging.getLogger().setLevel(oldloglevel)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(
+        description=DESCRIPTION,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        'config', nargs='+', metavar='CONFIG',
+        help="configuration file")
+    parser.add_argument(
+        '-v', '--verbose', action='store_true',
+        help="verbose log output")
+
+    if argcomplete:
+        argcomplete.autocomplete(parser)
+    elif '_ARGCOMPLETE' in os.environ:
+        logging.error('Argument completion requested but the "argcomplete" '
+                      'module is not installed. '
+                      'Maybe you want to "apt install python3-argcomplete"')
+        sys.exit(1)
+
+    args = parser.parse_args()
+
+    if args.verbose:
+        logging.getLogger().setLevel(logging.INFO)
+
+    errs = {}
+    for confname in args.config:
+        try:
+            pull_msgs(confname, args.verbose)
+        except imaplib.IMAP4.error as e:
+            logging.error('IMAP failure for config file %s: %s', confname, e)
+            errs[confname] = e
+    if errs:
+        exit(1)
diff --git a/imap-dl.1.pod b/imap-dl.1.pod
new file mode 100644
index 0000000..1407d05
--- /dev/null
+++ b/imap-dl.1.pod
@@ -0,0 +1,88 @@
+=encoding utf8
+
+=head1 NAME
+
+imap-dl -- a simple replacement for a minimalist user of getmail
+
+=head1 SYNOPSIS
+
+B<imap-dl> [B<-v>|B<--verbose>] B<configfile>...
+
+=head1 DESCRIPTION
+
+If you use getmail to reach an IMAP server as though it were POP
+(retrieving from the server, storing it in a maildir and optionally
+deleting), you can point this script to the getmail config and it
+should do the same thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+=head1 OPTIONS
+
+B<-v> or B<--verbose> causes B<imap-dl> to print more details
+about what it is doing.
+
+In addition to parts of the standard B<getmail> configuration,
+B<imap-dl> supports the following keywords in the config file:
+
+B<options.on_size_mismatch> can be set to B<exception>, B<none>, or
+B<warn>.  This governs what to do when the remote IMAP server claims a
+different size in the message summary list than the actual message
+retrieval (default: B<exception>).
+
+=head1 EXAMPLE CONFIG
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+=over 4
+
+    [retriever]
+    server = mail.example.net
+    username = foo
+    password = sekr1t!
+
+    [destination]
+    path = /home/foo/Maildir
+
+    [options]
+    delete = True
+
+=back
+
+=head1 LIMITATIONS
+
+B<imap-dl> is currently deliberately minimal.  It is designed to be
+used by someone who treats their IMAP mailbox like a POP server.
+
+It works with IMAP-over-TLS only, and it just fetches all messages
+from the default IMAP folder.  It does not support all the various
+features of getmail.
+
+B<imap-dl> is deliberately implemented in a modern version of python3,
+and tries to just use the standard library.  It will not be backported
+to python2.
+
+B<imap-dl> uses imaplib, which means that it does synchronous calls to
+the imap server.  A more clever implementation would use asynchronous
+python to avoid latency/roundtrips.
+
+B<imap-dl> does not know how to wait and listen for new mail using
+IMAP IDLE.  This would be a nice additional feature.
+
+B<imap-dl> does not yet know how to deliver to an MDA (or to
+B<notmuch-insert>).  This would be a nice thing to be able to do.
+
+=head1 SEE ALSO
+
+https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/
+
+=head1 AUTHOR
+
+B<imap-dl> and this manpage were written by Daniel Kahn Gillmor,
+inspired by some functionality from the getmail project.

Attachment: signature.asc
Description: PGP signature

Reply via email to