[Nmh-workers] I need help reading the mhstore man page

2014-02-28 Thread norm
The man page for mhstore recommends that, for the sake of security, I not put
the -auto switch in .mh_profile. Whatever the security risk is, would it not
also be present if I invoke mhstore with that switch? But the man page does
not seem to recommend against that.

The '|' facility is an obvious security risk, but as I read the man page it
would never be invoked unless my .mh_profile specifies a formatting string.

So assuming that my .mh_profile has no entries of the form

mhstore-store-type

what are the security risks of the -auto switch?


Norman Shapiro

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] I need help reading the mhstore man page

2014-02-28 Thread Ken Hornstein
The man page for mhstore recommends that, for the sake of security, I not put
the -auto switch in .mh_profile. Whatever the security risk is, would it not
also be present if I invoke mhstore with that switch? But the man page does
not seem to recommend against that.

-auto uses the filename that may be present in the MIME headers as the
filename of the output file.  So, for example, if I were to send you a
file named .cshrc (or .profile ... you get the idea), it could cause
an issue if you didn't notice what it was doing.  Looking at it more
closely ... you know, I think -clobber always is a terrible default.

I combine -auto with nmh-storage: /tmp.  I think that's reasonable.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Aleksander Matuszak
Ken Hornstein writes:

 I've been grappling with to do when we have issues with character set
 conversion.  

Unfortunately, I have a lot of experience and troubles with character
set conversion. 

 Specifically, I have two issues:
 
 - What to do if the character set is unsupported.

 Should we return the original bytes?  

It is not the best idea. Some sequences of bytes are control sequences
for terminal. This sometimes set terminal in unusable state.

 An error? [..]  Some string which says, We cannot convert
 klingon-8842 to us-ascii or the equivalent?
 

In practice it means a spam in exotic language and at this point I know
that I do not want to read such a message. 

In rare cases when I want to read in charset unsupported by 
configuration this is advantage of mh system that it is possible to
handle it separately. Save, decode, convert .. whatever.


 - What to do when we cannot convert a particular character.  This is a
 little more clear; the general trend is to use a substitution
 character.

This is very frequent and causes a lot of troubles. Entire message in
English and one foreign family name in original. Message send in utf-8
but (suppose) my terminal support only ASCII. Converison would fail. 

I can prepare an example but including it into this message can make it
difficult to read.

In my personal opinion a very good choice is conversion into
html-entities, like aogon; or lstrok; . It remains quite readable and
is still unique enough to convert it back in case of need.

max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
Unfortunately, I have a lot of experience and troubles with character
set conversion. 

Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
all of these problems! :-)

 Should we return the original bytes?  

It is not the best idea. Some sequences of bytes are control sequences
for terminal. This sometimes set terminal in unusable state.

Seems fine to me.

 An error? [..]  Some string which says, We cannot convert
 klingon-8842 to us-ascii or the equivalent?
 

In practice it means a spam in exotic language and at this point I know
that I do not want to read such a message. 

I can see that, but I'm not sure that's an appropriate choice for all
cases (like, for instance, MIME parameters).

 - What to do when we cannot convert a particular character.  This is a
 little more clear; the general trend is to use a substitution
 character.

This is very frequent and causes a lot of troubles. Entire message in
English and one foreign family name in original. Message send in utf-8
but (suppose) my terminal support only ASCII. Converison would fail. 

Errr ... really?  In the case I'm thinking, the one foreign family
name would have the offending character output as a '?' (or whatever).
The conversion would go through fine.

In my personal opinion a very good choice is conversion into
html-entities, like aogon; or lstrok; . It remains quite readable and
is still unique enough to convert it back in case of need.

Um, ouch.  Unless there's a common library that already implements
that behavior, that's not on the table at all.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Jerrad Pierce
amIn my personal opinion a very good choice is conversion into
amhtml-entities, like aogon; or lstrok; . It remains quite readable and
amis still unique enough to convert it back in case of need.

krUm, ouch.  Unless there's a common library that already implements
krthat behavior, that's not on the table at all.

Supposedly Recode does: http://recode.progiciels-bpi.ca/index.html

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
krUm, ouch.  Unless there's a common library that already implements
krthat behavior, that's not on the table at all.

Supposedly Recode does: http://recode.progiciels-bpi.ca/index.html

A super-quick scan of our systems does not show that as something that
comes out of the box installed on our systems (that's what I meant by
common).  Also ... a super-quick glance at the documentation does
not show exactly how to accomplish the desired behaviour.  Convert all
characters to HTML representation, yes.  Convert only characters you cannot
handle in the target character set to HTML equivalents?  I am not seeing
how that is done, but it's entirely possible I'm missing something.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Lyndon Nerenberg
This gets very icky, very quickly :-P

My feeling is that if you don't recognize the source character set, you cannot 
possibly convert it to a display format in any secure manner.  By default I 
think we should not display the content, but instead spit out a diagnostic, 
with the option to re-run the show (or whatever) with a command-line option 
that passes the content through unconverted.

I'm of mixed feelings about converting unknown characters to a proxy (e.g. 
'?').  This could be exploited to inject terminal escape sequences into xterm 
(or your VT220 – I know people who still use them!).

Yet another argument for declaring nmh a utf8-only zone, and convert everything 
to that on the way in.  We could bundle our own internal iconv and just call it 
a day.

--lyndon



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Jerrad Pierce
Recode need not be required, it could just be an option. iconv currently
isn't afterall, although they seem to complement each other. Recode is
part of the core distrib of my older Ubuntu 10.02.

Selective recoding would probably require calls for the substrings of interest.

As an aside, recode's support for b64 and QP surfaces is pretty cool:
http://www.informatik.uni-hamburg.de/RZ/software/gnu/utilities/recode_12.html

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Lyndon Nerenberg

On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote:

 If we make sure we're converting all non-printable characters into something
 else, I'm unclear as to how that could happen.  But if it can happen, please
 educate me!

It's a case of fooling the GB* and multibyte converters into aborting at an 
opportune moment.  MRC had a bible of all these edge cases that, sadly, expired 
with him :-(

What we really need is for someone to run a fuzzer over the nmh code base and 
see just what blows up.  I doubt the results will be pretty.

--lyndon



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Lyndon Nerenberg

On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote:

 We'd still have to deal with what happens when
 you want to convert U+1F4A9 to ISO-8859-1.

That's not an illegal parse of the input, it's a composting problem.  Not the 
same thing at all.




signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
Recode need not be required, it could just be an option. iconv currently
isn't afterall, although they seem to complement each other. Recode is
part of the core distrib of my older Ubuntu 10.02.

Fair enough ... but iconv() is part of POSIX, so assuming that it's available
is reasonable (if you don't have iconv(), we basically give up in terms of
handling different character sets).  Also, I am under the impression that
Recode is a superset of iconv, since it seems like Recode uses iconv
for the core character set conversion.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Lyndon Nerenberg

On Feb 28, 2014, at 12:24 PM, Ken Hornstein k...@pobox.com wrote:

 Fair enough ... but iconv() is part of POSIX, so assuming that it's available
 is reasonable (if you don't have iconv(), we basically give up in terms of
 handling different character sets).

Sadly, iconv() in practice is a nightmare.  The versions shipped with base OS 
systems vs. how applications expect to see the gnu-ized version(s) makes it 
impossible to keep a clean build environment in the face of the two.  (I just 
spent two days fighting this battle porting code to FreeBSD.)

Sucking in a POSIX-compliant iconv lets us be consistent everywhere.  And if 
ours is tightly POSIX compliant, we are no more wrong than anyone else.

--lyndon



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
 We'd still have to deal with what happens when you want to convert
 U+1F4A9 to ISO-8859-1.

That's not an illegal parse of the input, it's a composting problem.
Not the same thing at all.

Sigh, IT'S THE SAME THING.  iconv() returns EILSEQ at a particular point
in your conversion buffer.  What do you do next?

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
 Sigh, IT'S THE SAME THING.  iconv() returns EILSEQ at a particular point
 in your conversion buffer.  What do you do next?

In your example, emit a Pile Of Poo.

I know you're being flippant ... but it's a serious question.  Right now,
iconv() returns EILSEQ if you cannot convert an input character to the
target character.  It also does that if you have an invalid multibyte
sequence.  Based on _what you want to happen_, what, exactly, should be
done from a programming perspective?  Bail?  Substitute something?  Do
something else?

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Lyndon Nerenberg

On Feb 28, 2014, at 1:01 PM, Ken Hornstein k...@pobox.com wrote:

 Based on _what you want to happen_, what, exactly, should be
 done from a programming perspective?  Bail?

Yes! Bail! Don't be a vector for someone to do nasties!

If people want to see invalid content, they have cat(1) at hand.  One of the 
joys of MH is that it makes those tools available.


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
Look, software cannot read minds.  People would like it to, but I don't
work for the NSA, so I don't buy into that concept.  We have standards.
For a reason.  To eliminate ambiguity.  MIME has been around for how
many years now?  There is no excuse in this day and age for any software
to generate syntactically incorrect MIME content.

Here's the case I'm thinking about:

- You're running in an ISO-8859-1 locale.  This is permitted, no one questions
  this.
- You get a MIME message that contains the following header:

Content-Disposition: attachment; filename*=UTF-8''%F0%9F%92%A9.jpg

  This is a perfectly valid MIME header, formatted correctly by the MIME
  standards.  There is no ambiguity here, no brain damage.  Everything's
  above-board.

What, exactly, should I do with this MIME parameter?  When you decode
that and feed that to iconv() to convert it from UTF-8 to ISO-8859-1,
you're going to get EILSEQ as an error.  But this is not the sender's
fault; they followed the rules (that's assuming I did the MIME
formatting correctly; I was doing it by hand.  Pretend that I got it
right, for the sake of this discussion).

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Ken Hornstein
That is right. On the other hand, you never prevent malformed MIME
parameters.

Remember that we're not talking about malformed MIME parameters; we're
talking about entirely valid ones.

It is not a problem in case of one or two missing or substituted
symbols in long text. We can guess what is the me?ning of the word.
For many non-convertible symbols reading of such a text is more
similar to solving a crossword puzzle. What could be '??o??w??d'

I understand that, but my answer is basically: yes, it's non-reversible.
So what?  Note that the original data is still in the email if the user
wishes to example it directly.

We're talking about things that are being presented to a user, or
something that ends up as a filename (for filenames, I wasn't thinking
of using ? as a replacement character, but that's a minor detail).
It's not clear to me from your response what you think should be done
in these cases: should we simply not use the MIME parameter at all?
Somehow that seems like wrong choice.  Jerrad mentioned the Recode
library, but it doesn't seem like something that I'm interested in
using.  Skipping over the offending character is an option as well, but
still seems lousy.

I understand your point about it being difficult to figure out what the
original text is, but I am trying to understand exactly WHAT should be
done.  Just rejecting the text out of hand seems to be a much worse
solution.  As you say, if there is confusion the user can go to the original
mail, but to me that's an argument for giving the user at least something
that they might be able to understand.

FWIW, I wanted to see what other MUAs do, so I decided to look at mutt;
if iconv() fails and the target character set is UTF-8, it substitutes
U+FFFD, otherwise it substitutes '?'.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] I need help reading the mhstore man page

2014-02-28 Thread David Levine
 The man page for mhstore recommends that, for the sake of security,
 I not put the -auto switch in .mh_profile. Whatever the security
 risk is, would it not also be present if I invoke mhstore with that
 switch? But the man page does not seem to recommend against that.

Yes, they're equivalent.

Should we replace that recommendation with one that recommends
nmh-storage and/or a non-default -clobber setting with -auto?  mhstore
has the noted checks on the filename, and doesn't pass it or a
mhstore-store- string through the shell.  Is clobbering the only
security concern with -auto?

 -auto uses the filename that may be present in the MIME headers as the
 filename of the output file.  So, for example, if I were to send you a
 file named .cshrc (or .profile ... you get the idea), it could cause
 an issue if you didn't notice what it was doing.  Looking at it more
 closely ... you know, I think -clobber always is a terrible default.

I agree, but that default maintains backward compatibility.

 I combine -auto with nmh-storage: /tmp.  I think that's reasonable.

I use -auto -clobber ask

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers