Re: Filter script to remove html, fullquotes and header lines
Am 2022-03-22 um 01:56 schrieb raf: On Mon, Mar 21, 2022 at 01:28:28PM +0100, Martin Trautmann wrote: Am 2022-03-21 um 12:56 schrieb raf: textmail can probably do at least some of what you want: https://raf.org/textmail https://github.com/raforg/textmail and it has some extensibility so you can supply external translation programs for the bits it doesn't do. it can operate on individual mail messages or mbox files. check the output carefully. :-) Thanks, that looks very helpful to strip attachments, to remove headers and to convert message bodies - but it lacks the option to perform a search and replace on message bodies!? The -C option lets you supply a custom external "attachment translation" program. That might help. Something like "-C text/plain:txt:cmd". You just need to write the "cmd" program in the language of your choosing. If it doesn't work, you could modify textmail to do what you need (rather than writing the whole thing) if you like perl. Thanks, I didn't expect it there. I guss some simple sed commands would be good enough here. If you try it and run into problems, let me know (off list) and we can probably get it to work. Would I still need mutt to do the mails one by one and pipe them to testmail? I expected testmail to work on full mbox files, but it handled the first mail only.
Re: Filter script to remove html, fullquotes and header lines
On Mon, Mar 21, 2022 at 01:28:28PM +0100, Martin Trautmann wrote: > Am 2022-03-21 um 12:56 schrieb raf: > > textmail can probably do at least some of what you want: > > > >https://raf.org/textmail > >https://github.com/raforg/textmail > > > > and it has some extensibility so you can supply external > > translation programs for the bits it doesn't do. > > > > it can operate on individual mail messages or mbox files. > > check the output carefully. :-) > > Thanks, that looks very helpful to strip attachments, to remove headers > and to convert message bodies - but it lacks the option to perform a > search and replace on message bodies!? The -C option lets you supply a custom external "attachment translation" program. That might help. Something like "-C text/plain:txt:cmd". You just need to write the "cmd" program in the language of your choosing. If it doesn't work, you could modify textmail to do what you need (rather than writing the whole thing) if you like perl. And it might not work. "Converting" to the same mimetype might send it into an infinite loop. :-) I don't think so, but custom translations are done before the built-in ones, so you might need to run textmail twice. I just had a look at your original post. textmail does (1) well. It even detects vestigial text alternatives and doesn't replace html with them. (2) would require a custom translator (but it wouldn't have access to the headers so it can't do the "bonus" part (as is)). it can't delete headers based on their content (as is), only their names, so (3) can't be done as stated. however, you can probably identify the names of headers that are likely to be that long and supply a list of those names. If you try it and run into problems, let me know (off list) and we can probably get it to work. cheers, raf
Re: Filter script to remove html, fullquotes and header lines
On Mon, Mar 21, 2022 at 08:46:52AM +1100, Cameron Simpson wrote: > On 20Mar2022 13:36, Martin Trautmann wrote: > >do you know about any mutt script that would go from message to message > >and > > > >1) remove a html part if a plain text part is given > > > >2) remove all trailing lines, > > starting with a quote sign ">" > > and at least e.g. 10 occurences > > > > such as (^>[.*][\r\n]){9,} before the end of the message > > > > Maybe I could append xzxzxzx to the end of the message first, delete > >a fullquote up to there and remove xzxzxzx again? > > > > Bonus: Do not remove fullquotes for messages without in-reply-to or > >references headers. > > > >3) remove header lines which are longer than 5 lines > > > >I want to shrink the size of some mailboxes for archive purposes, > >without throwing away too much. > > I think you'll have to write your own. > > At minimum you need a full mail message parser so that you are not > filtering, say, base64 or QP content incorrectly. So something which > scans a mailbox and for each message: > - decodes it completely > - applies your filters > - assembles the new message > and write this out to a new mailbox (so it isn't destructive and can be > compared to the original - you don't want to accidentally shred your > archive). If you want to offload some of the work to existing code, you might look at things like GNU mailutils, or the tools that come with maildrop, or some of the subcommands of https://github.com/djcb/mu -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature
Re: Filter script to remove html, fullquotes and header lines
Am 2022-03-21 um 12:56 schrieb raf: textmail can probably do at least some of what you want: https://raf.org/textmail https://github.com/raforg/textmail and it has some extensibility so you can supply external translation programs for the bits it doesn't do. it can operate on individual mail messages or mbox files. check the output carefully. :-) Thanks, that looks very helpful to strip attachments, to remove headers and to convert message bodies - but it lacks the option to perform a search and replace on message bodies!?
Re: Filter script to remove html, fullquotes and header lines
On Mon, Mar 21, 2022 at 07:18:01AM +0100, Martin Trautmann wrote: > Am 2022-03-20 um 22:46 schrieb Cameron Simpson: > > I think you'll have to write your own. > > I agree - but I hoped it could have been done with some fine tuning of an > existing script. > > > > At minimum you need a full mail message parser so that you are not > > filtering, say, base64 or QP content incorrectly. So something which > > scans a mailbox and for each message: > > That's why I wondered to do it with mutt - it can do all that stuff. > > > - decodes it completely > > - applies your filters > > - assembles the new message > > and write this out to a new mailbox (so it isn't destructive and can be > > compared to the original - you don't want to accidentally shred your > > archive). > I would work on a copy of the mailbox file first, of course. > > > I'd do this in Python myself - it has a good email library and you can > > do all the things you describe fairly easily with it. > So Python it is... Never programmed it before myself. > > Thans, > Martin i didn't see the original post but... textmail can probably do at least some of what you want: https://raf.org/textmail https://github.com/raforg/textmail and it has some extensibility so you can supply external translation programs for the bits it doesn't do. it can operate on individual mail messages or mbox files. check the output carefully. :-) cheers, raf