Re: Sort and delete duplcate messages

Howard Bampton Sun, 03 May 2020 18:29:10 -0700

Is the goal to delete messages with the same subject line (but which may
have different bodies), or messages that are fully duplicates (so same
body, subject line, and most other headers)? "Duplicate" in the second case
is a lot harder as you could have messages that the received headers are
different but which are otherwise the same. To handle that case, I'd think
you'd want to do:
1) Use scan or something similar to find messages with the same subject
2) Use a custom scan template (or resort to grep) to find messages within
the previous set that have duplicated headers (presumably, to, from,
subject, and perhaps a few others).
3) Within any duplicates that have passed test 2, then use mhstore or the
like to extract the bodies, and use md5 or cmp to verify the bodies are the
same too.



On Sun, May 3, 2020 at 9:19 PM Ken Hornstein <[email protected]> wrote:

> >I know that 'sortm -textfield Subject' will sort messages accoring to
> >the subject field. Having run that command, is there a way to then
> >delete the first duplicate of each message in the list such that if 1
> >and 2 are duplicates and 6 and 7 are duplicates you would delete messages
> >2 and 7 leaving 1 and 6?
>
> I want to say you could do something with piping the output of scan
> into "uniq -d -f <num>".  Might require a custom scan format, but that
> seems relatively simple.
>
> Hm, a quick test:
>
> % scan -format '%(msg) %{subject}' | uniq -d -f 1
>
> suggests that it prints the first one, not later ones, so that isn't
> exactly what you want.  Might be a good starting point, though?  You could
> probably do something with uniq -c and pipe that to an awk script that
> did what you wanted.
>
> --Ken
>
>

Re: Sort and delete duplcate messages

Reply via email to