Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [really non-ASCII message bodies ]

2010-12-07 Thread Robert Elz
In this discussion people (other than perhaps Jon, though he hasn't
said this explicitly) have just been assuming that if the e-mail body
of a message contains data that is not ascii, then it must be some other
character set, because after all, all e-mail is text ...

In the days of MIME, that's simply not true, and while it is unlikely
that anyone is going to use prompter, or even some other editor, to
produce a jpeg file, there's nothing to prevent a script producing a
file with a jpeg body, and 822 headers, and handing that to nmh to process.

We might prefer such a script to generate all the right headers, but MH
really doesn't like it if we attempt to tell it mime info in the
components file (or the draft) - it insists on adding that itself, so
not doing all the content type processing before calling nmh processes
is understandable.

Now there is nothing at all illegal about this - even ignoring that
"illegal" is the wrong word to use in any case, "non-conforming" would
be the correct term.   The standards don't apply to what the user feeds
nmh, that's locally defined "anything goes" territory.   What matters is
what nmh hands to the MTA (and even more, what the local MTA passes on
to its peer).   There if we simply send an 822 (old style, non-MIME) message
with arbitrary binary content we have a non-conforming message.

That's what (as I understand it) Jon's patch handles (I still use the
latest released version of nmh, which predates all this stuff ... there
hasn't been a new release (since 1.3) for a long time now...) and makes a
standards conforming message.   It obviously has no way of knowing what the
data that it detects as non-ascii is (or not without extra information from
the sender), so "application/octet-stream" sounds to me as if it is the
perfect choice (along with either QP or B64 encoding to handle the body
format) to indicate "here is stuff, but I have no idea what it means, work
it out for yourself" - which for many users of this kind of procedure
would probably be adequate.

I'm certainly not arguing that we should keep this behaviour, and certainly
not as the default - I expect that real users of binary message bodies that
are not text are so rare that, even if there are any at all, updating them
would not be a huge problem (provided the change notes for the next nmh
release make it clear this has happened).

However, I don't think we should give up the ability to simply send an
e-mail where the body is image/jpeg or whatever - there's no requirement that
there be any text in the body of the message at all, even though most MUA's
simply assume that, and require a multipart to include anything that is not
text.  MH should be better than that, being just as good as "most MUA's" is
a fairly grevious insult IMO.  And while retaining the # language of mhbuild,
or something equivalent, is essential to enable truly general messages to
be created, expecting to use that for trivial tasks is, I agree, asking too
much - and requiring explicit mime processing at the whatnow stage should
only be necessary when the full mhbuild procedure is to be invoked.
(Do recall that wnen this was added, MIME messages were rare, and lots of
users didn't like them - most MUA's had no way to display them, not even
as "good" as nmh does now - and so wanting that processing was very
unusual.  These days, almost every message should comply with at least
basic mime formats.

My suggestion to handle general bodies is to allow a switch that sets the
MIME content-type of the message (defaulting to text/plain) - and then base
all the other decisions off that.   If (as a result of the default, or by
being explicitly set) we get a text/* content-type, then we can attempt to
work out the charset involved, and add the proper indicator.  On the
other hand, if someone really wants to send an application/octet-stream,
then let's allow them to do that, or if they want to send image/jpeg or
audio/whatever they should be able to do that too (a message that is
entirely audio/* could even be handled my "show" by playing it through the
local system's speakers, assuming that's possible - implementing voice-email)

I also don't believe that this processing should be keyed off some -attach
switch - as a way to simplify adding an attachment to a message (incidentally,
if given twice, can we have two attachments, or is there some other way to
do that?) it sounds OK, but for charset processing?

For text messages, the right thing should be done regardless of whether there's
any plan or intent to add attachments, and using a switch "-attach" in the
profile to mean "encode my text correctly" is bizarre...

I'm all for backwards compatability, but only backwards compatible for
correct behaviour, keeping all the existing bugs should not be required
(though I think there are environments where even that is expected.)

Even for attachments, as I understand it, that's keyed off a pseudo-header
added to the components file (and so appears in th

Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Ken Hornstein
>> The current behavior was the best idea that I had
>> at the time and nobody has said anything about it until recently.  I don't
>> mind it changing, but I don't want to all of a sudden get complaints from
>> people who were counting on this behavior.
>
>Well, I always considered the current behaviour a bug, but I didn't say
>anything, because the nmh-way of software development seemed pretty 
>inefficient and I didn't want to look at the code myself either.

We have a "way" of software development?  When did that happen? :-)

In all seriousness ... what, exactly, do you mean?  I guess our current
way is, "Anyone who's interested, please contribute!".  I don't see how
that's really much different than other open-source packages.

Also, as long as we're on the subject ... if people want to submit patches
to the list, perhaps formatting them with git format-patch (since, hey,
I went to the whole trouble to convert everything to git) might be
worthwhile.  Or perhaps just doing that when everyone agrees on the
code would make things easier (because the patch could be processed with
git am).

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] SMTP/IMAP/POP Support

2010-12-07 Thread Ken Hornstein
>woe... I said, "as much as possible" --- I mean, let's remove things
>that truly *NOBODY* uses.   

Fair enough; I guess when you said "as much as possible", I started to
get worried.  I mean, it is certainly POSSIBLE to remove POP support
from inc; if the criteria is only things that people don't use, then
I'm fine with that.

>I *WANT* POP support in inc --- but I'm cool if we get "modern" POP
>interface by having inc popen(3) fetchmail, or link against it's
>library, if that's more sane than fixing the code that's there to do
>stuff like POP/SSL.

Well, I can now speak from experience that adding TLS support to our
current code is not terrible.  Despite what others had thought, it
was actually pretty straightforward, even with all of the code in place
to do SASL encryption.  A quick glance at uip/popsbr.c leads me to believe
that adding similar TLS code to the POP library would be relatively simple.
Someone just has to do it, and now they even have example code to crib
from.  I might do it if I get some free time, but certainly someone else
can do it probably a lot sooner than I can.

Or if someone wants to modify the pop code to popen() fetchmail, I'm
alright with that as well.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ reallynon-ASCII message bodies ]

2010-12-07 Thread Lyndon Nerenberg

On 10-12-07 3:48 PM, Jon Steinhart wrote:

This is the first that anybody has spoken
up about this as far as I'm aware, so I was trying to protect backward
compatibility.


A lot of MTAs just accept the stuff, even though it violates the 
standards. The assumption was 'just treat it as 8859-1'. That sort of 
worked long ago, but not any more.


Now that the whole email delivery chain has had to start dealing with 
character set encodings properly I've noticed a (very) slight increase 
in the number of sites that are rejecting un- and mis-encoded non-ASCII 
text.


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ reallynon-ASCII message bodies ]

2010-12-07 Thread Jon Steinhart
If this is a bug that nobody has bothered with for years then by all
means go ahead and fix it.  This is the first that anybody has spoken
up about this as far as I'm aware, so I was trying to protect backward
compatibility.  No need to do that for bugs though.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Valdis . Kletnieks
On Tue, 07 Dec 2010 12:03:38 CST, Earl Hood said:
> On Tue, Dec 7, 2010 at 11:10 AM, Jon Steinhart  wrote:
> > I understand that my attachment system does not handle non-ASCII message
> > bodies, but again, that's because non-ASCII message bodies are not "legal".
> 
> Please cite an RFC that says non-ASCII bodies are not legal.
> 
> With MIME, you have the Content-Transfer-Encoding field, which allows
> for 8bit.  And then, if you have a Content-Type type that supports
> charset parameter, you can "legally" have a body that is non-ASCII.

A MIME message that has a Mime-Version: and appropriate C-T-E: headers can
certainly be non-ASCII.  What's illegal is sending non-ASCII *without* such 
headers
(which is what nmh has been doing in the past).


pgpqGwTkJbOGo.pgp
Description: PGP signature
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ reallynon-ASCII message bodies ]

2010-12-07 Thread David Levine
Peter wrote:

> markus schnalke wrote:
> >The old code generates ...
> >
> >... for ASCII:
> >
> >  Content-Type: text/plain; name="sendKi9x7j"; x-unix-mode="0644";
> >  charset="us-ascii"
> >  Content-ID: <4962.128958967...@argentina.foo>
> >  Content-Description:   ASCII text
> >
> >  foo
> >
> >... for non-ASCII (only if at least one attachment is present):
> >
> >  Content-Type: application/octet-stream; name="sendbRaV8T";
> >  x-unix-mode="0644"
> >  Content-ID: <5209.128958999...@argentina.foo>
> >  Content-Description:   UTF-8 Unicode text
> >  Content-Transfer-Encoding: base64
> >
> >  d2l0aCBKb24
>
> These are definitely just wrong -- we shouldn't be specifying
> name and x-unix-mode for the body text

Adding -attachformat 1 to the send entry of your .mh_profile
will get rid of the name and x-unix-mode.  That option can
also be added when entering send at the whatnow prompt.  The
send man page has examples of what it produces.

If there's consensus to make that the default, it would be an
easy code and documentation change.  (Yes, I'm volunteering
to make the changes.  But not to push for consensus :-)

> (and base64ing when we could q-p is a bit unfriendly).

Blackberries, and I think Droids, unnecessarily base64 text.
But I do agree with you, nmh shouldn't.

David

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread harald . geyer
Jon Steinhart :
> The current behavior was the best idea that I had
> at the time and nobody has said anything about it until recently.  I don't
> mind it changing, but I don't want to all of a sudden get complaints from
> people who were counting on this behavior.

Well, I always considered the current behaviour a bug, but I didn't say
anything, because the nmh-way of software development seemed pretty 
inefficient and I didn't want to look at the code myself either. So,
when I found out that nmh was sending illegal mails (the timestamps of
my filesystem tell me it was 25. Jun 2008), I just "fixed" this
by adding three lines to the components file:

Content-Type: text/plain; charset="iso-8859-15"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

(which of course I have to remove if I actually want to attach a file
to a mail)

I guess, I'm not the only one silently applying some workaround but
I also guess that much more people unknowingly send illegal mail.
So unless somebody on this mailinglist states that he actually needs
the current behaviour, I think it is very safe to assume that no such
complaints you are fearing will ever be made.

Given the number of mails wasted on this seemingly obvious question now, I
really regret not filing a bug report 2.5 years ago.

BTW, I also use nmh exclusively for my mail, except for the two times/year
when I actually need to send signed/encrypted mails.

Personally I think that nmh should rather abort than silently sending illegal
mails.

Harald

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Peter Maydell
markus schnalke wrote:
>[2010-12-07 12:33] Jon Steinhart 
>Examples for what gets generated from mail *body text*:

Thanks for doing this and saving me the effort ;-)

>The old code generates ...
>
>... for ASCII:
>
>  Content-Type: text/plain; name="sendKi9x7j"; x-unix-mode="0644";
>  charset="us-ascii"
>  Content-ID: <4962.128958967...@argentina.foo>
>  Content-Description:   ASCII text 
>
>  foo
>
>... for non-ASCII (only if at least one attachment is present):
>
>  Content-Type: application/octet-stream; name="sendbRaV8T";
>  x-unix-mode="0644"
>  Content-ID: <5209.128958999...@argentina.foo>
>  Content-Description:   UTF-8 Unicode text 
>  Content-Transfer-Encoding: base64
>
>  d2l0aCBKb24

These are definitely just wrong -- we shouldn't be specifying
name and x-unix-mode for the body text (and base64ing when
we could q-p is a bit unfriendly).

-- PMM

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread markus schnalke
Jon, sorry for the harsh mail. I really had been in rage. :-/

In a recent mail, you said:

> The current behavior was the best idea that I had
> at the time and nobody has said anything about it until recently.

I really don't blame you for what we have; quite the opposite: I am
very gretaful for what we have.



[2010-12-07 22:45] markus schnalke 
> 
> I ask other people to take a look and express their opinion.

Thanks to everyone speaking up.


meillo

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread markus schnalke
[2010-12-07 12:33] Jon Steinhart 
> Still trying to understand this.  Decided to finally look at the code instead 
> of
> relying on my fading memory.

:-) Thanks.


> The existing code takes a non-ASCII message body and sends it as an attachment
> of type application/octet-stream.
> 
> Your patch changes this behavior so that it is sent as type text/plain with 
> the
> appropriately chosen character set.

Both correct.

> In order to do this, you test the message body for non-ASCII characters in
> attach().  If you find any, you write an entry directly into the composition
> file instead of calling make_mime_composition_file_entry().

Correct.

> This is changing existing behavior if I understand it correctly.

Behavior in what gets generated, yes, but this is of need when fixing
things.

Examples for what gets generated from mail *body text*:

The old code generates ...

... for ASCII:

  Content-Type: text/plain; name="sendKi9x7j"; x-unix-mode="0644";
  charset="us-ascii"
  Content-ID: <4962.128958967...@argentina.foo>
  Content-Description:   ASCII text 

  foo

... for non-ASCII (only if at least one attachment is present):

  Content-Type: application/octet-stream; name="sendbRaV8T";
  x-unix-mode="0644"
  Content-ID: <5209.128958999...@argentina.foo>
  Content-Description:   UTF-8 Unicode text 
  Content-Transfer-Encoding: base64

  d2l0aCBKb24


With my patch such MIME parts are generated ...

... for ASCII:

  Content-Type: text/plain; charset="us-ascii"
  Content-ID: <5048.128958978...@argentina.foo>

  foo

... for non-ASCII:

  Content-Type: text/plain; charset="UTF-8"
  Content-ID: <5260.128959006...@argentina.foo>
  Content-Transfer-Encoding: quoted-printable

  Umlauts: =C3=A4 and =C3=B6 and =C3=BC.


The function make_mime_composition_file_entry() gives us nothing but
information we don't need/want (temp file names, file permissions) and
it definately does not use the best possible CT and CTE for the body
text.


> This is fine
> with me provided that users must explicitly enable the change using an option.

An option to activate a fix???


> Now that I'm actually looking at the code, I would suggest an option
> (choose a better name) of binary-body-content-type.  You could change the
> make_mime_composition_file_entry() line
> 
>   content_type = binary ? "application/octet-stream" : "text/plain";
> 
> to replace the "application/octet-stream" with the option value, or the 
> existing
> value as a default if the option is not specified.
> 
> A user wanting this behavior would have a profile entry of
> 
>   binary-body-content-type: text/plain
> 
> I think that this would be a simpler code change that would accomplish your 
> goal.

Sorry, but I really think you don't get the point. We don't need
config options if we already have the facilities to do it right
automatically. Why should any user need to tell nmh what content type
to use for the text he writes? The mhbuild facility can already find
out which is appropriate. My patch also divides between mail text and
attachments for which different things are relevant. Your comment
above does this not and would then use binary-body-content-type for
any non-ASCII attachment also.

I read your mails and ask myself if you really read what I write and
if you have had a look into the code we are talking about.

I very much value the work you did for nmh and you see that this patch
bases on what you created, but now it may be to point to either have a
close look at the problem and code or step back and let other people
talk. I do think you don't understand the situation and the relavant
code good enough (presumably due to lack of time).

Of course, I may be wrong. Currently, however, it rather seems to me
as if it's not me who is not understanding the whole thing. I really
spent much time in the code and doing tests. And I did my best
explaining everything.


I ask other people to take a look and express their opinion.


meillo

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Peter Maydell
chad wrote:
>On Dec 7, 2010, at 12:48 PM, Ken Hornstein wrote:
>> You know  I'm all for backwards compatibility and everything, but
>> I'm wondering ... did the previous behavior actually make sense?  Can
>> people argue that it was desirable or correct?  Or was the previous
>> behavior actually wrong, and this is really fixing a long-standing
>> bug?  
>
>It was a bug.  The only suggesting that it wasn't a bug is instead saying 
>that it was `illegal' (which it wasn't... it just usually was a bad idea).

I agree; we should just change the behaviour here. (In particular
the previous behaviour would have differed in how it handled the
body depending on whether there was an attachment or not, which
suggests to me that it's just not a case anybody has cared about
before now.)

(In fact I think we should go ahead and change the behaviour for
not-plain-ASCII bodies even if the user didn't pass the -attach
switch, but I'm guessing I might get argued down on that.)

I do think the "is the body not plain ASCII?" check is not quite right.
I think that the presence of special characters (most notably ESC)
ought to also MIMEification. Otherwise we will not do the right thing
for Japanese character sets like shift-JIS. (Yes, I do care about this,
it's not just idle nitpickery.) I would suggest
 if (*p != '\t' && (*p >= 127 || *p < 32) {
non_ascii = 1;

ie encode unless it's in the printable ascii range or space or tab.

-- PMM

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread chad

On Dec 7, 2010, at 12:48 PM, Ken Hornstein wrote:

> You know  I'm all for backwards compatibility and everything, but
> I'm wondering ... did the previous behavior actually make sense?  Can
> people argue that it was desirable or correct?  Or was the previous
> behavior actually wrong, and this is really fixing a long-standing
> bug?  

It was a bug.  The only suggesting that it wasn't a bug is instead saying 
that it was `illegal' (which it wasn't... it just usually was a bad idea).

nmh was used so infrequently in places that aren't 7-bit clean that none
of the people who noticed complained; they just binned it with the
host of non-i18n'd software and moved on.  This is generalizing a bit
from a small sample, but I'd be even more astonished than if we found
more than a dozen people who use mh exclusively for all their email. ;)

*Chad



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Jon Steinhart
> >The existing code takes a non-ASCII message body and sends it as an 
> >attachment
> >of type application/octet-stream.
> >
> >Your patch changes this behavior so that it is sent as type text/plain with 
> >the
> >appropriately chosen character set.
> 
> You know  I'm all for backwards compatibility and everything, but
> I'm wondering ... did the previous behavior actually make sense?  Can
> people argue that it was desirable or correct?  Or was the previous
> behavior actually wrong, and this is really fixing a long-standing
> bug?  Because if we decide that the previous behavior is a bug, then
> I don't think an explicit enable option for this chance makes sense; I'd
> prefer that the new behavior be the default.
> 
> (I am personally on the fence regarding whether or not the previous
> behavior is a bug).
> 
> --Ken

Don't disagree with you.  The current behavior was the best idea that I had
at the time and nobody has said anything about it until recently.  I don't
mind it changing, but I don't want to all of a sudden get complaints from
people who were counting on this behavior.  Maybe that number is 0, but I
have no way of knowing.  I don't care that much, so if you think compability
isn't an issue here that's fine with me.

If the defalt behavior was to change I would add a "binary" flag to
make_mime_composition_file_entry() so that the body didn't have to be
scanned for non-ASCII characters twice.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Ken Hornstein
>The existing code takes a non-ASCII message body and sends it as an attachment
>of type application/octet-stream.
>
>Your patch changes this behavior so that it is sent as type text/plain with the
>appropriately chosen character set.

You know  I'm all for backwards compatibility and everything, but
I'm wondering ... did the previous behavior actually make sense?  Can
people argue that it was desirable or correct?  Or was the previous
behavior actually wrong, and this is really fixing a long-standing
bug?  Because if we decide that the previous behavior is a bug, then
I don't think an explicit enable option for this chance makes sense; I'd
prefer that the new behavior be the default.

(I am personally on the fence regarding whether or not the previous
behavior is a bug).

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Jon Steinhart
Still trying to understand this.  Decided to finally look at the code instead of
relying on my fading memory.  Sorry if some of my earlier memory-based comments
were off-base.  Please let me know if my understanding of your proposed patch is
correct.

The existing code takes a non-ASCII message body and sends it as an attachment
of type application/octet-stream.

Your patch changes this behavior so that it is sent as type text/plain with the
appropriately chosen character set.

In order to do this, you test the message body for non-ASCII characters in
attach().  If you find any, you write an entry directly into the composition
file instead of calling make_mime_composition_file_entry().

This is changing existing behavior if I understand it correctly.  This is fine
with me provided that users must explicitly enable the change using an option.

Now that I'm actually looking at the code, I would suggest an option
(choose a better name) of binary-body-content-type.  You could change the
make_mime_composition_file_entry() line

content_type = binary ? "application/octet-stream" : "text/plain";

to replace the "application/octet-stream" with the option value, or the existing
value as a default if the option is not specified.

A user wanting this behavior would have a profile entry of

binary-body-content-type: text/plain

I think that this would be a simpler code change that would accomplish your 
goal.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Ralph Corderoy

Hi,

markus schnalke wrote:
> > BTW, I would suggest using isascii() rather than (*p > 127 || *p < 0).
> 
> I just kept what you once wrote. ;-P
> But, yes, you are right.

Given it's `char *p' then *p may be unsigned on some systems, e.g. ARM,
and a compiler could warn on testing if it's negative so isascii() is
much nicer.  :-)

Cheers,
Ralph.


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Earl Hood
On Tue, Dec 7, 2010 at 11:10 AM, Jon Steinhart  wrote:
> I understand that my attachment system does not handle non-ASCII message
> bodies, but again, that's because non-ASCII message bodies are not "legal".

Please cite an RFC that says non-ASCII bodies are not legal.

With MIME, you have the Content-Transfer-Encoding field, which allows
for 8bit.  And then, if you have a Content-Type type that supports
charset parameter, you can "legally" have a body that is non-ASCII.

Note, way back when MIME was first defined, it was probably not good
practice to use 8bit CTE since MTAs were not friendly with 8bit data,
but today, that is less likely.

--ewh

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread markus schnalke
[2010-12-07 09:10] Jon Steinhart 
> > [2010-12-07 08:27] Jon Steinhart 
> > > Sounds good.  Is the patch that you sent out complete?  I don't see an 
> > > option
> > > that enables/disables this behavior and I think that there should be one.
> > 
> > I believe it's correct that there is no switch.
> > 
> > If one wants to deactivate it, do not specify -attach.
> > 
> > If -attach is given, I believe the changes are fixes for broken
> > behavior. Your attachment system lacks some awareness for non-ASCII
> > text, which you probably don't deal with much. This is improved with
> > my patch.
> 
> OK.  I think that there should be a switch.  I guess it bugs me to see the
> character-by-character examination of the message body on by default.

The char-by-char examination is ugly, yes.


> I understand that my attachment system does not handle non-ASCII message
> bodies, but again, that's because non-ASCII message bodies are not "legal".
> I think that you have justified extending nmh to handle "illegal" message
> bodies.  I'm just nitpicking on the implementation details.

With MIMEification, they are legal.

I want nmh to convert illegal draft messages to legal messages.
Currently nmh sends illegal messages if the user composes such ones.
With my patch nmh cares to only send legal messages. Programs should
support humans if possible.


> Could you please explain again how you get the character set information
> for non-ASCII message bodies?  Sorry that I didn't save your original
> message on this.  I seem to recall that you got it from the profile; I
> would rather see you get this from the LANG environment variable.

I just leave it up to buildmimeproc to find out. :-) We don't need to
do it at several places. I only say it's text/plain but nothing about
the encoding.

The man page of mhbuild(1) writes:

If a text content contains any 8-bit characters (characters with the
high bit set) and the character set is not specified as above,  then
mhbuild will  assume  the character  set  is  of  the type given by
the environment variable MM_CHARSET.  If this environment variable is
not set, then the character set will  be  labeled  as “x-unknown”.

If  a  text  content  contains  only 7-bit characters and the
character set is not specified as above, then the character set will
be labeled as “us-ascii”.

This information probably is outdated, but generally it hits the
point, probably the code is already better (in respect to MM_CHARSET).


> BTW, I would suggest using isascii() rather than (*p > 127 || *p < 0).

I just kept what you once wrote. ;-P

But, yes, you are right.


meillo

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Jon Steinhart
> [2010-12-07 08:27] Jon Steinhart 
> > Sounds good.  Is the patch that you sent out complete?  I don't see an 
> > option
> > that enables/disables this behavior and I think that there should be one.
> 
> I believe it's correct that there is no switch.
> 
> If one wants to deactivate it, do not specify -attach.
> 
> If -attach is given, I believe the changes are fixes for broken
> behavior. Your attachment system lacks some awareness for non-ASCII
> text, which you probably don't deal with much. This is improved with
> my patch.
> 
> meillo

OK.  I think that there should be a switch.  I guess it bugs me to see the
character-by-character examination of the message body on by default.

I understand that my attachment system does not handle non-ASCII message
bodies, but again, that's because non-ASCII message bodies are not "legal".
I think that you have justified extending nmh to handle "illegal" message
bodies.  I'm just nitpicking on the implementation details.

Could you please explain again how you get the character set information
for non-ASCII message bodies?  Sorry that I didn't save your original
message on this.  I seem to recall that you got it from the profile; I
would rather see you get this from the LANG environment variable.

BTW, I would suggest using isascii() rather than (*p > 127 || *p < 0).

Thanks,
Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread markus schnalke
[2010-12-07 08:27] Jon Steinhart 
> Sounds good.  Is the patch that you sent out complete?  I don't see an option
> that enables/disables this behavior and I think that there should be one.

I believe it's correct that there is no switch.

If one wants to deactivate it, do not specify -attach.

If -attach is given, I believe the changes are fixes for broken
behavior. Your attachment system lacks some awareness for non-ASCII
text, which you probably don't deal with much. This is improved with
my patch.


meillo

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread Jon Steinhart
Sounds good.  Is the patch that you sent out complete?  I don't see an option
that enables/disables this behavior and I think that there should be one.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [ really non-ASCII message bodies ]

2010-12-07 Thread markus schnalke
Hoi,

discussing these things hadn't been easy sometimes, but the points and
arguments became clear and now we reached some kind of consensus. For
me, the discussion had been worthwhile.


[2010-12-03 11:33] Jon Steinhart 
> 
>  o  Nobody objects to markus addressing this issue.  The objections are that
> his implementation breaks things, and handling illegal body content is
> not a compelling enough reason for breaking things.

> So, I think that enough has been said on this topic.  markus, can you outline
> for us an implementation that doesn't break things?  I think that everyone 
> will
> bless your changes if you do.

I agree with you here. Hence I created a new patch that concentrates
on the fourth case, explained in the other mail. AFAIS it does not
break anything.

Let me explain:

As it only modifies your attachment system now, everything is the same
if -attach is not specified.

For -attach being specified, the situation is such:

  no attachment hdr  +  body contains only ASCII  ->  sent as is
  attachment hdr +  body contains only ASCII  ->  MIMEified
  attachment hdr +  body contains non-ASCII   ->  MIMEified
  no attachment hdr  +  body contains non-ASCII   ->  MIMEified

The fourth case is different.

Additionally, the body text will be sent with a correct mime-type in
any case. Currently it was sent as application/octet-stream in the
third case.


The relation to `mime' at the whatnow prompt:

One surely wants to unset automimeproc when using -attach.

Running `mime' at the whatnow prompts is usually not needed as Jon's
attachment system handles it automatically.

Collisions only occure if an attachment header is present in the mail
and one runs `mime' at the whatnow prompt. If the body text contains
non-ASCII chars or not is irrelevant, it works as expected in both
cases.

As long as one does not add attachment headers to a specific draft,
one is able to use any mhbuild directives (/^#/) when running `mime'
at the whatnow prompt afterwards.


Further work:

The documentation currently does not cover my changes. Not much to
change, and I like to do that if the proposed changes are accepted.

More complex MIME structures than ``text followed by attachments'' are
not possible with Jon's attachment system. (Like they are not with
most MUAs.) One needs to create them with mhbuild directives and run
`mime' manually. (For forwarding messages, see below.)

Jon's attachment system still needs mhshow-suffix- entries or it will
be really dumb. This is something that should be covered separately,
maybe by a conceptional redesign (automatic detection, mailcap, ...).

Forwarding messages in MIME format could be added to Jon's system in a
way similar to what I proposed initially. I believe this would be
possible without breaking stuff. We would need to add -attach to
forw(1).


meillo


> P.S.  I'm trying to honor the way that you're name appears in your mail 
> header.
>   Do you really want it to be "markus" or should it be "Markus"?

Usually, I prefer ``meillo'' because that's a nearly unique
identifier. If you want to use my real name, I don't care if you spell
it in lower-case or with capital `M'. More important is honoring my
work by mentioning my name in the ChangeLog or commit messages. ;-)

diff --git a/uip/sendsbr.c b/uip/sendsbr.c
index 57ef007..8f5f2e1 100644
--- a/uip/sendsbr.c
+++ b/uip/sendsbr.c
@@ -196,6 +196,7 @@ attach(char *attachment_header_field_name, char *draft_file_name,
 int			c;			/* current character for body copy */
 int			has_attachment;		/* draft has at least one attachment */
 int			has_body;		/* draft has a message body */
+int			non_ascii;		/* msg body contains non-ASCII chars */
 int			length;			/* length of attachment header field name */
 char		*p;			/* miscellaneous string pointer */
 
@@ -228,29 +229,36 @@ attach(char *attachment_header_field_name, char *draft_file_name,
 	if (strncasecmp(field, attachment_header_field_name, length) == 0 && field[length] == ':')
 	has_attachment = 1;
 
-if (has_attachment == 0)
-	return (DONE);
-
 /*
- *	We have at least one attachment.  Look for at least one non-blank line
- *	in the body of the message which indicates content in the body.
+ * Check if body contains at least one non-blank char (= not empty)
+ * and if it contains non-ASCII chars (= need MIME).
+ * We MIMEify the message also if the body contains non-ASCII text.
  */
 
 has_body = 0;
+non_ascii = 0;
 
 while (get_line() != EOF) {
 	for (p = field; *p != '\0'; p++) {
-	if (*p != ' ' && *p != '\t') {
+	if (*p != ' ' && *p != '\t')
 		has_body = 1;
+	if (*p > 127 || *p < 0) {
+		non_ascii = 1;
 		break;
 	}
 	}
-
-	if (has_body)
+	if (non_ascii)
 	break;
 }
 
 /*
+ * Bail out if there are no attachments and only ASCII text.
+ * This means we don't need to convert it to MIME.
+ */
+if (!has_attachment && non_ascii == 0)
+	return (DONE);