On 13Nov2013 09:06, Chris Down <ch...@chrisdown.name> wrote: > On 2013-11-12 19:22:24 +0100, Jonas Petong wrote: > > Today I accidentally copied my mails into the same folder where they had > > been > > stored before (evil keybinding!!!) and now I'm faced with about a 1000 > > copies > > within my inbox. Since those duplicates do not have a unique mail-id, it's > > hopeless to filter them with mutts integrated duplicate limiting pattern. > > Command '<limit>~=' has no effect in my case and deleting them by hand > > will take me hours! > > > > I know this question has been (unsuccessfully) asked before. Anyhow is > > there is > > a way to tag every other mail (literally every nth mail of my inbox-folder) > > and > > afterwards delete them? I know something about linux-scripting but > > unfortunately > > I have no clue where to start with and even which script-language to use. > > for every file: > read file and put the message-id in a dict in { message-id: [file1, > file2..fileN] } order > > for each key in that dict: > delete all filename values except the first > > It should not be very complicated to write. If nobody else comes up with > something, I can possibly it for you after work.
Based on Jonas' post: Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command '<limit>~=' has no effect I'd infer that the message-id fields are unique. Jonas: _Why_/_how_ did you get duplicate messages with distinct message-ids? Have you verified (by inspecting a pair of duplicate messages) that their Message-ID headers are different? If the message-ids are unqiue for the duplicate messages I would: Move all the messages to a Maildir folder if they are not already so. This lets you deal with each message as a distinct file. Write a script long the lines of Chris Down's suggestion, but collate messages by subject line, and store a tuple of: (message-file-path, Date:-header-value, Message-ID:-header-value) You may then want to compare messages with identical Date: values. Or, if you are truly sure that the folder contains an exact and complete duplicate: load all the filenames, order by Date:-header, iterate over the list (after ordering) and _move_ every second item into another Maildir folder (in case you're wrong). L = [] for each Maildir-file-in-new,cur: load in the message headers and get the Date: header string L.append( (date:-value, subject:-value, maildir-file-path) ) L = sorted(L) for i in range(0, len(L), 2): move the file L[i][1] into another directory Note that you don't need to _parse_ the Date: header; if these are duplicated messages the literal text of the Date: header should be identical for the adjacent messages. HOWEVER, you probably want to ensure either that all the identical date/subject groupings are only pairs, in case of multiple distinct messages with identical dates. Cheers, -- Cameron Simpson <c...@zip.com.au> If you can't annoy somebody, there's little point in writing. - Kingsley Amis