getmail upstream appears to have no plans to convert to python3 in the
near future.

Some of us use only a minimal subset of features of getmail, and it
would be nice to have something simpler, with the main complexity
offloaded to the modern python3 stdlib.

Signed-off-by: Daniel Kahn Gillmor <d...@fifthhorseman.net>
---
 Makefile                    |   1 +
 debian/mailscripts.install  |   1 +
 debian/mailscripts.manpages |   1 +
 imap-dl                     | 221 ++++++++++++++++++++++++++++++++++++
 imap-dl.1.pod               |  88 ++++++++++++++
 5 files changed, 312 insertions(+)
 create mode 100755 imap-dl
 create mode 100644 imap-dl.1.pod

diff --git a/Makefile b/Makefile
index 352f6f0..860ec27 100644
--- a/Makefile
+++ b/Makefile
@@ -1,5 +1,6 @@
 MANPAGES=mdmv.1 mbox2maildir.1 \
        notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \
+       imap-dl.1 \
        email-extract-openpgp-certs.1 \
        email-print-mime-structure.1 \
        notmuch-import-patch.1
diff --git a/debian/mailscripts.install b/debian/mailscripts.install
index 99216c1..221f7bf 100644
--- a/debian/mailscripts.install
+++ b/debian/mailscripts.install
@@ -6,3 +6,4 @@ notmuch-import-patch /usr/bin
 notmuch-extract-patch/notmuch-extract-patch /usr/bin
 email-extract-openpgp-certs /usr/bin
 email-print-mime-structure /usr/bin
+imap-dl /usr/bin
diff --git a/debian/mailscripts.manpages b/debian/mailscripts.manpages
index 6d7cb30..bfbd56b 100644
--- a/debian/mailscripts.manpages
+++ b/debian/mailscripts.manpages
@@ -6,3 +6,4 @@ notmuch-import-patch.1
 notmuch-extract-patch.1
 email-extract-openpgp-certs.1
 email-print-mime-structure.1
+imap-dl.1
diff --git a/imap-dl b/imap-dl
new file mode 100755
index 0000000..c2a8186
--- /dev/null
+++ b/imap-dl
@@ -0,0 +1,221 @@
+#!/usr/bin/python3
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2019 Daniel Kahn Gillmor
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or (at
+# your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+'''A simple replacement for a minimalist use of getmail.
+
+Usage: 
+
+   imap-dl [-v|--verbose] configfile…
+
+In particular, if you use getmail to reach an IMAP server as though it
+was POP (retrieving from the server and optionally deleting), you can
+point this script to the getmail config and it should do the same
+thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+----------
+[retriever]
+server = mail.example.net
+username = foo
+password = sekr1t!
+
+[destination]
+path = /home/foo/Maildir
+
+[options]
+delete = True
+----------
+'''
+
+import configparser
+import sys
+import ssl
+import imaplib
+import re
+import logging
+import mailbox
+import os.path
+import statistics
+import time
+
+_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) 
RFC822.SIZE (?P<size>[0-9]+)\)$')
+def break_fetch_summary(line):
+    '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}'''
+    match = _summary_splitter.match(line)
+    if not match:
+        raise Exception('malformed summary line %s'%(line))
+    ret = {}
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS 
\([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$')
+def break_fetch(line):
+    '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}'''
+    match = _fetch_splitter.match(line)
+    if not match:
+        raise Exception('malformed fetch line %s'%(line))
+    ret = {}
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+def pull_msgs(configfile):
+    conf = configparser.ConfigParser()
+    conf.read_file(open(configfile, 'r'))
+    oldloglevel = logging.getLogger().getEffectiveLevel()
+    conf_verbose = conf.getint('options', 'verbose', fallback=1)
+    if conf_verbose > 1:
+        logging.getLogger().setLevel(logging.INFO)
+    logging.info('pulling from config file %s', configfile)
+    delete = conf.getboolean('options', 'delete', fallback=False)
+    read_all = conf.getboolean('options', 'read_all', fallback=True)
+    if not read_all:
+        raise NotImplementedError('imap-dl only supports 
options.read_all=True, got False')
+    rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever')
+    if rtype.lower() != 'simpleimapsslretriever':
+        raise NotImplementedError('imap-dl only supports 
retriever.type=SimpleIMAPSSLRetriever, got %s'%(rtype,))
+    # FIXME: handle `retriever.record_mailbox`
+    dtype = conf.get('destination', 'type', fallback='Maildir')
+    if dtype.lower() != 'maildir':
+        raise NotImplementedError('imap-dl only supports 
destination.type=Maildir, got %s'%(dtype,))
+    dst = conf.get('destination', 'path')
+    dst = os.path.expanduser(dst)
+    if not os.path.isdir(dst):
+        raise Exception('expected destination directory, but %s is not a 
directory'%(dst,))
+    ca_certs = conf.get('retriever', 'ca_certs', fallback=None)
+    on_size_mismatch = conf.get('options', 'on_size_mismatch', 
fallback='exception').lower()
+    sizes_mismatched = []
+    ctx = ssl.create_default_context(cafile=ca_certs)
+    mdst = mailbox.Maildir(dst)
+    with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'),
+                           port=conf.get('retriever', 'port', fallback=993),
+                           ssl_context=ctx) as imap:
+        resp = imap.login(conf.get('retriever', 'username'),
+                          conf.get('retriever', 'password'))
+        if resp[0] != 'OK':
+            raise Exception('login failed with %s as user %s on %s'%(
+                resp,
+                conf.get('retriever', 'username'),
+                conf.get('retriever', 'server')))
+        resp = imap.select()
+        if resp[0] != 'OK':
+            raise Exception('selection failed: %s'%(resp,))
+        if len(resp[1]) != 1:
+            raise Exception('expected exactly one EXISTS response from select, 
got %s'%(resp[1]))
+        n = int(resp[1][0])
+        if n == 0:
+            logging.info('No messages to retrieve')
+            logging.getLogger().setLevel(oldloglevel)
+            return
+        resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)')
+        if resp[0] != 'OK':
+            raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp))
+        pending = list(map(break_fetch_summary, resp[1]))
+        sizes = {}
+        for m in pending:
+            sizes[m['uid']] = m['size']
+        fetched = {}
+        uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, 
key=lambda x: x['uid'])))
+        totalbytes = sum([x['size'] for x in pending])
+        logging.info('Fetching %d messages, expecting %d bytes of message 
content',
+                     len(pending), totalbytes)
+        # FIXME: sort by size?
+        # FIXME: fetch in batches or singly instead of all-at-once?
+        # FIXME: rolling deletion?
+        # FIXME: asynchronous work?
+        before = time.perf_counter()
+        resp = imap.uid('FETCH', uids, '(UID BODY.PEEK[])')
+        after = time.perf_counter()
+        if resp[0] != 'OK':
+            raise Exception('UID fetch failed %s'%(resp[0]))
+        for f in resp[1]:
+            # these objects are weirdly structured. i don't know why
+            # these trailing close-parens show up.  so this is very
+            # ad-hoc and nonsense
+            if isinstance(f, bytes):
+                if f != b')':
+                    raise Exception('got bytes object of length %d but 
expected simple closeparen'%(len(f),))
+            elif isinstance(f, tuple):
+                if len(f) != 2:
+                    raise Exception('expected 2-part tuple, got 
%d-part'%(len(f),))
+                m = break_fetch(f[0])
+                if m['size'] != len(f[1]):
+                    raise Exception('expected %d octets, got %d'%(
+                        m['size'], len(f[1])))
+                if m['size'] != sizes[m['uid']]:
+                    if on_size_mismatch == 'warn':
+                        if len(sizes_mismatched) == 0:
+                            logging.warning('size mismatch: summary said %d 
octets, fetch sent %d',
+                                            sizes[m['uid']], m['size'])
+                        elif len(sizes_mismatched) == 1:
+                            logging.warning('size mismatch: (mismatches after 
the first suppressed until summary)')
+                        sizes_mismatched.append(sizes[m['uid']] - m['size'])
+                    elif on_size_mismatch == 'exception':
+                        raise Exception('size mismatch: summary said %d 
octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid 
hard failure)',
+                                        sizes[m['uid']], m['size'])
+                    elif on_size_mismatch != 'none':
+                        raise Exception('size_mismatch: 
options.on_size_mismatch should be none, warn, or exception (found "%s")', 
on_size_mismatch)
+                fname = mdst.add(f[1].replace(b'\r\n', b'\n'))
+                logging.info('stored message %d/%d (uid %d, %d bytes) in %s',
+                             len(fetched) + 1, len(pending), m['uid'], 
m['size'], fname)
+                del sizes[m['uid']]
+                fetched[m['uid']] = m['size']
+        if sizes:
+            logging.warning('unhandled UIDs: %s', sizes)
+        logging.info('%d bytes of %d messages fetched in %g seconds (~%g 
KB/s)',
+                     sum(fetched.values()), len(fetched), after - before,
+                     sum(fetched.values())/((after - before)*1024))
+        if on_size_mismatch == 'warn' and len(sizes_mismatched) > 1:
+            logging.warning('%d size mismatches out of %d messages (mismatches 
in bytes: mean %f, stddev %f)',
+                            len(sizes_mismatched), len(fetched),
+                            statistics.mean(sizes_mismatched),
+                            statistics.stdev(sizes_mismatched))
+        if delete:
+            logging.info('trying to delete %d messages from IMAP store', 
len(fetched))
+            resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), 
'+FLAGS', r'(\Deleted)')
+            if resp[0] != 'OK':
+                raise Exception('failed to set \\Deleted flag: %s'%(resp))
+            resp = imap.expunge()
+            if resp[0] != 'OK':
+                raise Exception('failed to expunge! %s'%(resp))
+        else:
+            logging.info('not deleting any messages, since options.delete is 
not set')
+        logging.getLogger().setLevel(oldloglevel)
+
+if __name__ == '__main__':
+    args = sys.argv[1:]
+    for varg in ['-v', '--verbose']:
+        while varg in args:
+            logging.getLogger().setLevel(logging.INFO)
+            args.remove(varg)
+
+    if not args:
+        logging.error('no config files supplied, must supply at least one')
+        exit(1)
+    for confname in args:
+        pull_msgs(confname)
diff --git a/imap-dl.1.pod b/imap-dl.1.pod
new file mode 100644
index 0000000..c57c359
--- /dev/null
+++ b/imap-dl.1.pod
@@ -0,0 +1,88 @@
+=encoding utf8
+
+=head1 NAME
+
+imap-dl -- a simple replacement for a minimalist user of getmail
+
+=head1 SYNOPSIS
+
+B<imap-dl> [B<-v>|B<--verbose>}] B<configfile>...
+
+=head1 DESCRIPTION
+
+If you use getmail to reach an IMAP server as though it was POP
+(retrieving from the server, storing it in a maildir and optionally
+deleting), you can point this script to the getmail config and it
+should do the same thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+=head1 OPTIONS
+
+B<-v> or B<--verbose> causes B<imap-dl> to print more details
+about what it is doing.
+
+In addition to parts of the standard B<getmail> configuration,
+B<imap-dl> supports the following keywords in the config file:
+
+B<options.on_size_mismatch> can be set to B<exception>, B<none>, or
+B<warn>.  This governs what to do when the remote IMAP server claims a
+different size in the message summary list than the actual message
+retrieval (default: B<exception>).
+
+=head1 EXAMPLE CONFIG
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+=over 4
+
+    [retriever]
+    server = mail.example.net
+    username = foo
+    password = sekr1t!
+
+    [destination]
+    path = /home/foo/Maildir
+
+    [options]
+    delete = True
+
+=back
+
+=head1 LIMITATIONS
+
+B<imap-dl> is currently deliberately minimal.  It is designed to be
+used by someone who treats their IMAP mailbox like a POP server.
+
+It works with IMAP-over-TLS only, and it just fetches all messages
+from the default IMAP folder.  It does not support all the various
+features of getmail.
+
+B<imap-dl> is deliberately implemented in a modern version of python3,
+and tries to just use the standard library.  It will not be backported
+to python2.
+
+B<imap-dl> uses imaplib, which means that it does synchronous calls to
+the imap server.  A more clever implementation would use asynchronous
+python to avoid latency/roundtrips.
+
+B<imap-dl> does not know how to wait and listen for new mail using
+IMAP IDLE.  This would be a nice additional feature.
+
+B<imap-dl> does not yet know how to deliver to an MDA (or to
+B<notmuch-insert>).  This would be a nice thing to be able to do.
+
+=head1 SEE ALSO
+
+https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/
+
+=head1 AUTHOR
+
+B<imap-dl> and this manpage were written by Daniel Kahn Gillmor,
+inspired by some functionality from the getmail project.
-- 
2.23.0

Reply via email to