[Nmh-workers] I need help reading the mhstore man page
The man page for mhstore recommends that, for the sake of security, I not put the -auto switch in .mh_profile. Whatever the security risk is, would it not also be present if I invoke mhstore with that switch? But the man page does not seem to recommend against that. The '|' facility is an obvious security risk, but as I read the man page it would never be invoked unless my .mh_profile specifies a formatting string. So assuming that my .mh_profile has no entries of the form mhstore-store-type what are the security risks of the -auto switch? Norman Shapiro ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] I need help reading the mhstore man page
The man page for mhstore recommends that, for the sake of security, I not put the -auto switch in .mh_profile. Whatever the security risk is, would it not also be present if I invoke mhstore with that switch? But the man page does not seem to recommend against that. -auto uses the filename that may be present in the MIME headers as the filename of the output file. So, for example, if I were to send you a file named .cshrc (or .profile ... you get the idea), it could cause an issue if you didn't notice what it was doing. Looking at it more closely ... you know, I think -clobber always is a terrible default. I combine -auto with nmh-storage: /tmp. I think that's reasonable. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Ken Hornstein writes: I've been grappling with to do when we have issues with character set conversion. Unfortunately, I have a lot of experience and troubles with character set conversion. Specifically, I have two issues: - What to do if the character set is unsupported. Should we return the original bytes? It is not the best idea. Some sequences of bytes are control sequences for terminal. This sometimes set terminal in unusable state. An error? [..] Some string which says, We cannot convert klingon-8842 to us-ascii or the equivalent? In practice it means a spam in exotic language and at this point I know that I do not want to read such a message. In rare cases when I want to read in charset unsupported by configuration this is advantage of mh system that it is possible to handle it separately. Save, decode, convert .. whatever. - What to do when we cannot convert a particular character. This is a little more clear; the general trend is to use a substitution character. This is very frequent and causes a lot of troubles. Entire message in English and one foreign family name in original. Message send in utf-8 but (suppose) my terminal support only ASCII. Converison would fail. I can prepare an example but including it into this message can make it difficult to read. In my personal opinion a very good choice is conversion into html-entities, like aogon; or lstrok; . It remains quite readable and is still unique enough to convert it back in case of need. max ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Unfortunately, I have a lot of experience and troubles with character set conversion. Well, if you just bit the bullet and switched to UTF-8, you wouldn't have all of these problems! :-) Should we return the original bytes? It is not the best idea. Some sequences of bytes are control sequences for terminal. This sometimes set terminal in unusable state. Seems fine to me. An error? [..] Some string which says, We cannot convert klingon-8842 to us-ascii or the equivalent? In practice it means a spam in exotic language and at this point I know that I do not want to read such a message. I can see that, but I'm not sure that's an appropriate choice for all cases (like, for instance, MIME parameters). - What to do when we cannot convert a particular character. This is a little more clear; the general trend is to use a substitution character. This is very frequent and causes a lot of troubles. Entire message in English and one foreign family name in original. Message send in utf-8 but (suppose) my terminal support only ASCII. Converison would fail. Errr ... really? In the case I'm thinking, the one foreign family name would have the offending character output as a '?' (or whatever). The conversion would go through fine. In my personal opinion a very good choice is conversion into html-entities, like aogon; or lstrok; . It remains quite readable and is still unique enough to convert it back in case of need. Um, ouch. Unless there's a common library that already implements that behavior, that's not on the table at all. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
amIn my personal opinion a very good choice is conversion into amhtml-entities, like aogon; or lstrok; . It remains quite readable and amis still unique enough to convert it back in case of need. krUm, ouch. Unless there's a common library that already implements krthat behavior, that's not on the table at all. Supposedly Recode does: http://recode.progiciels-bpi.ca/index.html ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
krUm, ouch. Unless there's a common library that already implements krthat behavior, that's not on the table at all. Supposedly Recode does: http://recode.progiciels-bpi.ca/index.html A super-quick scan of our systems does not show that as something that comes out of the box installed on our systems (that's what I meant by common). Also ... a super-quick glance at the documentation does not show exactly how to accomplish the desired behaviour. Convert all characters to HTML representation, yes. Convert only characters you cannot handle in the target character set to HTML equivalents? I am not seeing how that is done, but it's entirely possible I'm missing something. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
This gets very icky, very quickly :-P My feeling is that if you don't recognize the source character set, you cannot possibly convert it to a display format in any secure manner. By default I think we should not display the content, but instead spit out a diagnostic, with the option to re-run the show (or whatever) with a command-line option that passes the content through unconverted. I'm of mixed feelings about converting unknown characters to a proxy (e.g. '?'). This could be exploited to inject terminal escape sequences into xterm (or your VT220 – I know people who still use them!). Yet another argument for declaring nmh a utf8-only zone, and convert everything to that on the way in. We could bundle our own internal iconv and just call it a day. --lyndon signature.asc Description: Message signed with OpenPGP using GPGMail ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Recode need not be required, it could just be an option. iconv currently isn't afterall, although they seem to complement each other. Recode is part of the core distrib of my older Ubuntu 10.02. Selective recoding would probably require calls for the substrings of interest. As an aside, recode's support for b64 and QP surfaces is pretty cool: http://www.informatik.uni-hamburg.de/RZ/software/gnu/utilities/recode_12.html ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote: If we make sure we're converting all non-printable characters into something else, I'm unclear as to how that could happen. But if it can happen, please educate me! It's a case of fooling the GB* and multibyte converters into aborting at an opportune moment. MRC had a bible of all these edge cases that, sadly, expired with him :-( What we really need is for someone to run a fuzzer over the nmh code base and see just what blows up. I doubt the results will be pretty. --lyndon signature.asc Description: Message signed with OpenPGP using GPGMail ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote: We'd still have to deal with what happens when you want to convert U+1F4A9 to ISO-8859-1. That's not an illegal parse of the input, it's a composting problem. Not the same thing at all. signature.asc Description: Message signed with OpenPGP using GPGMail ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Recode need not be required, it could just be an option. iconv currently isn't afterall, although they seem to complement each other. Recode is part of the core distrib of my older Ubuntu 10.02. Fair enough ... but iconv() is part of POSIX, so assuming that it's available is reasonable (if you don't have iconv(), we basically give up in terms of handling different character sets). Also, I am under the impression that Recode is a superset of iconv, since it seems like Recode uses iconv for the core character set conversion. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
On Feb 28, 2014, at 12:24 PM, Ken Hornstein k...@pobox.com wrote: Fair enough ... but iconv() is part of POSIX, so assuming that it's available is reasonable (if you don't have iconv(), we basically give up in terms of handling different character sets). Sadly, iconv() in practice is a nightmare. The versions shipped with base OS systems vs. how applications expect to see the gnu-ized version(s) makes it impossible to keep a clean build environment in the face of the two. (I just spent two days fighting this battle porting code to FreeBSD.) Sucking in a POSIX-compliant iconv lets us be consistent everywhere. And if ours is tightly POSIX compliant, we are no more wrong than anyone else. --lyndon signature.asc Description: Message signed with OpenPGP using GPGMail ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
We'd still have to deal with what happens when you want to convert U+1F4A9 to ISO-8859-1. That's not an illegal parse of the input, it's a composting problem. Not the same thing at all. Sigh, IT'S THE SAME THING. iconv() returns EILSEQ at a particular point in your conversion buffer. What do you do next? --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Sigh, IT'S THE SAME THING. iconv() returns EILSEQ at a particular point in your conversion buffer. What do you do next? In your example, emit a Pile Of Poo. I know you're being flippant ... but it's a serious question. Right now, iconv() returns EILSEQ if you cannot convert an input character to the target character. It also does that if you have an invalid multibyte sequence. Based on _what you want to happen_, what, exactly, should be done from a programming perspective? Bail? Substitute something? Do something else? --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
On Feb 28, 2014, at 1:01 PM, Ken Hornstein k...@pobox.com wrote: Based on _what you want to happen_, what, exactly, should be done from a programming perspective? Bail? Yes! Bail! Don't be a vector for someone to do nasties! If people want to see invalid content, they have cat(1) at hand. One of the joys of MH is that it makes those tools available. signature.asc Description: Message signed with OpenPGP using GPGMail ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
Look, software cannot read minds. People would like it to, but I don't work for the NSA, so I don't buy into that concept. We have standards. For a reason. To eliminate ambiguity. MIME has been around for how many years now? There is no excuse in this day and age for any software to generate syntactically incorrect MIME content. Here's the case I'm thinking about: - You're running in an ISO-8859-1 locale. This is permitted, no one questions this. - You get a MIME message that contains the following header: Content-Disposition: attachment; filename*=UTF-8''%F0%9F%92%A9.jpg This is a perfectly valid MIME header, formatted correctly by the MIME standards. There is no ambiguity here, no brain damage. Everything's above-board. What, exactly, should I do with this MIME parameter? When you decode that and feed that to iconv() to convert it from UTF-8 to ISO-8859-1, you're going to get EILSEQ as an error. But this is not the sender's fault; they followed the rules (that's assuming I did the MIME formatting correctly; I was doing it by hand. Pretend that I got it right, for the sake of this discussion). --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] General question - unsupported charset conversion
That is right. On the other hand, you never prevent malformed MIME parameters. Remember that we're not talking about malformed MIME parameters; we're talking about entirely valid ones. It is not a problem in case of one or two missing or substituted symbols in long text. We can guess what is the me?ning of the word. For many non-convertible symbols reading of such a text is more similar to solving a crossword puzzle. What could be '??o??w??d' I understand that, but my answer is basically: yes, it's non-reversible. So what? Note that the original data is still in the email if the user wishes to example it directly. We're talking about things that are being presented to a user, or something that ends up as a filename (for filenames, I wasn't thinking of using ? as a replacement character, but that's a minor detail). It's not clear to me from your response what you think should be done in these cases: should we simply not use the MIME parameter at all? Somehow that seems like wrong choice. Jerrad mentioned the Recode library, but it doesn't seem like something that I'm interested in using. Skipping over the offending character is an option as well, but still seems lousy. I understand your point about it being difficult to figure out what the original text is, but I am trying to understand exactly WHAT should be done. Just rejecting the text out of hand seems to be a much worse solution. As you say, if there is confusion the user can go to the original mail, but to me that's an argument for giving the user at least something that they might be able to understand. FWIW, I wanted to see what other MUAs do, so I decided to look at mutt; if iconv() fails and the target character set is UTF-8, it substitutes U+FFFD, otherwise it substitutes '?'. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] I need help reading the mhstore man page
The man page for mhstore recommends that, for the sake of security, I not put the -auto switch in .mh_profile. Whatever the security risk is, would it not also be present if I invoke mhstore with that switch? But the man page does not seem to recommend against that. Yes, they're equivalent. Should we replace that recommendation with one that recommends nmh-storage and/or a non-default -clobber setting with -auto? mhstore has the noted checks on the filename, and doesn't pass it or a mhstore-store- string through the shell. Is clobbering the only security concern with -auto? -auto uses the filename that may be present in the MIME headers as the filename of the output file. So, for example, if I were to send you a file named .cshrc (or .profile ... you get the idea), it could cause an issue if you didn't notice what it was doing. Looking at it more closely ... you know, I think -clobber always is a terrible default. I agree, but that default maintains backward compatibility. I combine -auto with nmh-storage: /tmp. I think that's reasonable. I use -auto -clobber ask David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers