On Thu, Aug 18, 2022 at 10:10:38AM +0200, martin f krafft via Mutt-users wrote:
> Thanks for your responses so far!
> 
> The reason I need this index is that I have to provide evidence of "a huge
> volume of mails" on a given topic, without actually sharing the emails.

If this is all you need to do, then, do you really need to preserve
the threading?  Seems all you really need is a list of the messages
for the given subject, presumably with enough info to demonstrate
their uniqueness.  Assuming you or someone else on your behalf can get
at them, you can probably get what you need from the mail system's
logs.  Or you can use formail to spit out just the headers that are
interesting from your maildir folder...  Something like this:

  cd $mail_folder/cur
  for file in *; do
    formail -X from: -X subject: -X date:
  done > some_output_file

This will give you a bunch of groups of 3 lines that contain the three
headers for each message.  There should always be 3 lines (unless you
have some broken messages that excluded any of those headers, which
can happen, but shouldn't)--but they probably won't always be in the
same order.  You can, of course, specify additional headers...

The formail command provides -x and -X options; -x extracts just the
header value, whereas -X extracts the full header line.  You want -X
so you know which header you're looking at, so you don't have to write
code to try to figure it out heuristically.

You can do all of the above entirely in Python, which avoids the nasty
quoting problems with constructions like "for file in *; ..." by using
os.scandir(), and using the subprocess module (or similar) to execute
the formail command for each message file.  It also makes it possible
to know that a given line of output is for the specific file you're
asking about, so the ordering and potential absence of the message's
headers becomes a non-concern.   Then you can read the output line by
line, assigning the header value to a dict field based on the header
name.  Stuff your dicts in a list, or something.

Then you can pass your dicts to, say, csv.DictWriter, and write them
out as a CSV file.  Or as JSON.  Or add records for them to a
database, so you can query the data.   Or whatever.  If you really
do need to show the thread graph, you can produce that yourself using
the message IDs and references.

Is any of this better than just using Python's email module?  Probably
not... YMMV.  But either way, if I had to solve this I would just use
Python, and not try to hack around with other utilities.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: signature.asc
Description: PGP signature

Reply via email to