On December 16, 2002 at 22:09, Tomasz Ostrowski wrote: > I needed to archive sent mail with MHonArc and I needed to put > contents of To: header to mesage index. It was not possible with > MHonArc-2.5.13 so I wrote a small patch that added rc-variable $TO$. > > Then I used > <LiTemplate> > <li><strong>$SUBJECT$</strong><br> > $MSGLOCALDATE(CUR;%Y-%m-%d %H:%M)$<br> > <em>From</em>: $FROM$<br> > <em>To</em>: $TO$ > </li> > </LiTemplate> > > Check the attached file MHonArc-2.5.13-to.patch.
The preferable method is to allow for arbitrary message header variables instead of just To:. Otherwise, you end up replicating code when people want 'cc' or other fields. I've considered such a feature, but it does impact things like mha-dbrecover and the types of comments that should be placed in message files to allow recovering. And then address harvesters complicate things (i.e SPAMMODE). May just have to punt on trying to provide mha-dbrecover ability of arbitrary message header resource variables. What I envision is something like the following: <MsgFieldsSave> to cc </MsgFieldsSave> And then for resource variable usage, you would access them like the following: $MSGFIELD(CUR;to)$ $MSGFIELD(CUR;cc)$ > 2. Attachmants with non-ascii names > > I had problems with accessing attachments extracted with MHonArc from > Windows if they had non-ascii characters in name or characters > forbidden for file names: \/:*?"<>| (when using m2h_external::filter; > usename). > > So I have written a patch that converts both types of characters to > underscore, just like spaces in original MHonArc. > > Check the attached file MHonArc-2.5.13-attachment_name.patch. Good catch. Probably more efficient would be just exclude whitespace and non-ascii characters in one tr// operation: $fname =~ tr/\0-\40\t\n\r\177-\377/_/; > 3. Preserving charset of message > > Most mails I have to convert use central european ISO-8859-2 > encoding. Converting it to named entities did not work - it lacks The named entities are going away for most of the iso-8859-x sets since they were based on SGML. They will be replaced with Unicode character entity references, and it has already been done for the latest snapshot builds. > browsers support. Using UTF-8 would make my archives un-grep-able so > I wrote a patch that made possible that text/plain MIME-parts > preserve original charset by adding rc-variable $CHARSET$. This feature is insufficient. It assumes that messages only contain a single text entity part, which of course is wrong when dealing with MIME messages. MIME allows you to have multiple text entities, with each one having a different charset. Therefore, with your patch, the last filtered entity wins out while the text from the other entities are mis-rendered in the browser. (I have thought of doing something like your patch does in the past, but due to the multipart issue, I did not.) A more robust solution is under development where you will be able to define a final text encoding that all text entities should be converted to. Generally, you would use it to map everything to utf-8, but if certain Perl modules are installed, you could have all text data encoded to what you choose, like iso-8859-2. Of course, choosing a non-universal encoding may cause characters to get "lost" in text entities orginally encoded in a different charset, but this may be acceptible in some locales. As for the un-grepable utf-8, I think people will eventually have to dealing with it if they want to have archives that are multi-lingual. Perl 5.8 finally has robust utf8 support, so whipping of a grep-like tool in Perl would solve the "grep" problem. Unfortunately, HTML does not allow mixed-character encodings is the same document, making things problematic when trying to convert MIME mail into HTML. --ewh --------------------------------------------------------------------- To sign-off this list, send email to [EMAIL PROTECTED] with the message text UNSUBSCRIBE MHONARC-DEV
