On Thu, Aug 18, 2022 at 10:10:38AM +0200, martin f krafft via Mutt-users wrote: > Thanks for your responses so far! > > The reason I need this index is that I have to provide evidence of "a huge > volume of mails" on a given topic, without actually sharing the emails.
If this is all you need to do, then, do you really need to preserve the threading? Seems all you really need is a list of the messages for the given subject, presumably with enough info to demonstrate their uniqueness. Assuming you or someone else on your behalf can get at them, you can probably get what you need from the mail system's logs. Or you can use formail to spit out just the headers that are interesting from your maildir folder... Something like this: cd $mail_folder/cur for file in *; do formail -X from: -X subject: -X date: done > some_output_file This will give you a bunch of groups of 3 lines that contain the three headers for each message. There should always be 3 lines (unless you have some broken messages that excluded any of those headers, which can happen, but shouldn't)--but they probably won't always be in the same order. You can, of course, specify additional headers... The formail command provides -x and -X options; -x extracts just the header value, whereas -X extracts the full header line. You want -X so you know which header you're looking at, so you don't have to write code to try to figure it out heuristically. You can do all of the above entirely in Python, which avoids the nasty quoting problems with constructions like "for file in *; ..." by using os.scandir(), and using the subprocess module (or similar) to execute the formail command for each message file. It also makes it possible to know that a given line of output is for the specific file you're asking about, so the ordering and potential absence of the message's headers becomes a non-concern. Then you can read the output line by line, assigning the header value to a dict field based on the header name. Stuff your dicts in a list, or something. Then you can pass your dicts to, say, csv.DictWriter, and write them out as a CSV file. Or as JSON. Or add records for them to a database, so you can query the data. Or whatever. If you really do need to show the thread graph, you can produce that yourself using the message IDs and references. Is any of this better than just using Python's email module? Probably not... YMMV. But either way, if I had to solve this I would just use Python, and not try to hack around with other utilities. -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
signature.asc
Description: PGP signature