Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Aleksander Matuszak
Ken Hornstein writes:

> >Unfortunately, I have a lot of experience and troubles with character
> >set conversion. 
> 
> Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
> all of these problems! :-)

It is not that simple. Utf-8 solves couple of problems but creates some
new  =:-) Advantages and disadvantages of utf-8 is a very wide
topic.


> >In practice it means a spam in exotic language and at this point I know
> >that I do not want to read such a message. 
> 
> I can see that, but I'm not sure that's an appropriate choice for all
> cases (like, for instance, MIME parameters).

That is right. On the other hand, you never prevent malformed MIME
parameters.


> >This is very frequent and causes a lot of troubles. Entire message in
> >English and one foreign family name in original. Message send in utf-8
> >but (suppose) my terminal support only ASCII. Converison would fail. 
> 
> Errr ... really?  In the case I'm thinking, the one foreign family
> name would have the offending character output as a '?' (or whatever).
> The conversion would go through fine.

Well, the meaning of word "fail". Formally it is not possible to
convert any utf-8 character to 256 characters in iso/cp/... 8bit set. 
Converison would fail.

Ignoring absent symbols or substituting them by something else causes
that the conversion would go through fine.

Ignoring symbols or substituting them by '?' causes that conversion is
non-reversible and the result may be difficult to read. 

It is not a problem in case of one or two missing or substituted
symbols in long text. We can guess what is the me?ning of the word.
For many non-convertible symbols reading of such a text is more
similar to solving a crossword puzzle. What could be '??o??w??d'
 
> >In my personal opinion a very good choice is conversion into
> >html-entities, like ą or ł . It remains quite readable and
> >is still unique enough to convert it back in case of need.
> 
> Um, ouch.  Unless there's a common library that already implements
> that behavior, that's not on the table at all.

This is a serious argument. However, mentioned Recode library has
something like that: 
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT

I do not know is it useful or not.

max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 Thread Aleksander Matuszak
Ken Hornstein writes:

> I've been grappling with to do when we have issues with character set
> conversion.  

Unfortunately, I have a lot of experience and troubles with character
set conversion. 

> Specifically, I have two issues:
> 
> - What to do if the character set is unsupported.

> Should we return the original bytes?  

It is not the best idea. Some sequences of bytes are control sequences
for terminal. This sometimes set terminal in unusable state.

> An error? [..]  Some string which says, "We cannot convert
> klingon-8842 to us-ascii" or the equivalent?
> 

In practice it means a spam in exotic language and at this point I know
that I do not want to read such a message. 

In rare cases when I want to read in charset unsupported by 
configuration this is advantage of mh system that it is possible to
handle it separately. Save, decode, convert .. whatever.


> - What to do when we cannot convert a particular character.  This is a
> little more clear; the general trend is to use a substitution
> character.

This is very frequent and causes a lot of troubles. Entire message in
English and one foreign family name in original. Message send in utf-8
but (suppose) my terminal support only ASCII. Converison would fail. 

I can prepare an example but including it into this message can make it
difficult to read.

In my personal opinion a very good choice is conversion into
html-entities, like ą or ł . It remains quite readable and
is still unique enough to convert it back in case of need.

max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] UTF-8 message bodies

2012-05-30 Thread Aleksander Matuszak
Ken Hornstein writes:

> >It is possible to keep almost unchanged state with addition of
> >one more clause to mhbuild like pair #off #on which marks the
> >region where ^# is not interpreted as directive.
> 
>  But to me it seems dumb that # characters can't be in the
>  beginning of a line, and having people have to know about
>  #on/#off directives just seems like the wrong solution.  [..]
>  But if you run "mime" at a WhatNow?  prompt then presumably
>  you're smart enough to know you have to escape any leading #
>  characters.

I'm trying to write as short as possible (to make less gramar
mistakes =:-)), but sometimes it is too short.

Suppose, you use automimeproc: 1 and you want to include (as a
part of the message) some lines from program source, shell script
or whatever.

You can type
#off
:r whatever (or copy by mouse)
#on
and do not have to edit (and remember to edit) included part to escape 
leading #.

As the additional directive, user doesn't have to know about it
unless he needs it.

   max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] UTF-8 message bodies

2012-05-30 Thread Aleksander Matuszak
Ken Hornstein writes:

> >Well, it seems that both approaches can coexists if buildmimeproc
> >would do nothing in case of already MIMEfied message. Instead of
> >error reporting.
> 
> There is one caution here ... if you have a line that begins with a "#"
> that is NOT an mhbuild directive then you'll get an error.  That's fine for
> us that know about it, but it can bite you.
> 

I know it too well, already. I have to use automimeproc=1 because
of charset. Every time the message contains #include or #! I see
an error.

I've been thinking a bit of this and I can see (as a user not
programmer) a couple of possibilities. 

It is possible to keep almost unchanged state with addition of
one more clause to mhbuild like pair #off #on which marks the
region where ^# is not interpreted as directive.

It can be separated directive typed by user and (long) directive
for mhbuild. The last can be almost uniqe, since it never be
typed directly. The user directive can be then very flexible with
any starting sign (e.g. one can use @attach file.ext). Processing
user directives would require additional preprocessor (supplied
or user made - like sed script). The real MIMEfying would be by
mhbuild before sending.

Those are just ideas. I like to know your opinion.

   max



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] More than one parameters in .mh_profile

2012-05-30 Thread Aleksander Matuszak
Ken Hornstein writes:

> >I need moreproc to be "less -force" but show (nmh-1.3) refuses
> >this.
> 
> Yeah, I guess what happens there is mhl (or whatever) is trying to
> exec("less -force").  Which as you've noted doesn't work.
> 
> Other people have complained about this as well.  But in this case you
> could just set the environment variable LESS to "f", right?
> 

Not quite. In fact I need -force only in show, to enforce
silently displaying incompatibile charsets.

> >Workaround is to make the shell script like vim-mail which is in
> >fact call to vim  -c ":set ft=mail" .
> >
> >Is it possible to do such thigs simpler?
> 
> Right now ... no.  To start, I have no idea how this interface should
> look like.  Suggestions here are welcome; code is even more welcome :-)
>

It does not seem too difficult to implement function which splits
any string into separate pieces and prepend them to exec*
parameters. But discussion shows that fundamental question is
rather: should it be passed to the shell (and gives chance to use
!$ or some such) or replace the shell job and interpret string inside
the code?

Finally, I think it is not worth to solve it now.

   max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] UTF-8 message bodies

2012-05-28 Thread Aleksander Matuszak
Ralph Corderoy writes:

> Hi max,
> 
> > Why not set in .mh_profile
> > automimeproc: 1
> 
> I like to look over the mime'd draft before sending to check I got the
> directives right.

Well, it seems that both approaches can coexists if buildmimeproc
would do nothing in case of already MIMEfied message. Instead of
error reporting.

   max

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] UTF-8 message bodies

2012-05-28 Thread Aleksander Matuszak
Joel Uckelman writes:

> My wife pointed out to me today that for the past seven years (!) since
> the Linux distribution we use switched to UTF-8, when she sends messages
> containing non-ASCII characters [..] and that nmh is
> producing messages which lack any headers indicating that the contents
> are UTF-8.
> 
> Is there a solution to this with nmh 1.4 or 1.5? All I turn up when I
> search for things about character sets and nmh are things related to
> displaying received messages in nmh, nothing about outgoing mail.

Why not set in .mh_profile
automimeproc: 1

Today non-MIME messages are slightly obsolete even they have to
be accepted for backward compatibility.

   max

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


[Nmh-workers] More than one parameters in .mh_profile

2012-05-28 Thread Aleksander Matuszak
In the .mh_profile some entries specifies programs like:

Editor: vim-mail
moreproc: less
postproc: /usr/lib/mh/post


Some of those programs require options or parameters but
apparently this is not accepted.

I need moreproc to be "less -force" but show (nmh-1.3) refuses
this.

Workaround is to make the shell script like vim-mail which is in
fact call to vim  -c ":set ft=mail" .

Is it possible to do such thigs simpler?


   max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mailxi and attachements

2012-01-30 Thread Aleksander Matuszak
Ralph Corderoy writes:

> Hi,
> 
> I still use mail(1) for sending one-line emails or in pipes.  

More imporant (for me) is the possibility to send only file(s) as
attachements.

echo Enclosed | nail -a some.file some...@somewhere.net 

Similar functionality has mutt, switch -a. I can't do it via nmh.
Also composing message with attachment is not very easy. For me
some form of -a switch in comp and repl is missing. Use of mail
client from different subsystem is not convenient, e.g. I can't
use .mh_aliases.

> It's provided here by the heirloom-mailx package, derived from
> Berkeley Mail 8.1 but brought up to date.  A concise list of
> features, http://heirloom.sourceforge.net/mailx.html

It depends on package. Sometimes it is installed as nail and -a
means attachchment, sometimes is installed as mailx and because
of backward compatibility -a has different meaning. Then
heilroom-mailx looses its most important feature.

   Max

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Ken Hornstein writes:

> > Couple of years ago emacs switch to "internal" coding in utf-8. I
> > had to stop using emacs and mh-e. 
> 
> See, this is what I'm missing - why, exactly?  I assume the problem was not
> just philsophical.

Neither politycal, nor religious =:-) Purely practical.

Imagine replying to an utf-8 message. Appended signature is in
iso, so I start with mixture of charsets. Then I have a lot of
codings to play with. Conversion of utf-8 part of message.
Headers of the message. Three types of internal emacs codings.
It was possible to set it all correctlly, but I realize that my
life is too short for such a games.

How to process ... and have some work done. =:-)

I was advised to switch to utf-8, but this would not help too
much.  Imagine replying to iso message   
And inserting text file from disc.

With regret I had to find another editor.

   Max


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Lyndon Nerenberg writes:

> >> But restrict the entire nmh to utf-8 charset would cripple system.
> 
> How so, specifically?  Plan9 has run a native UTF8-only mail environment 
> for ages (with a very MH-like mailstore, as well), and it's far from 
> crippled.  

This a tiny difference between "entire" and "internally". 

Normally, no one is interesting how the TV signal is processed
inside the TV set. As long as he can see on the screen what he
wants. So, I am not against utf-8 used internally. Almost.

Couple of years ago emacs switch to "internal" coding in utf-8. I
had to stop using emacs and mh-e. 

So, there can be another tiny difference between "internally" and
"internally".


   Max



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Ralph Corderoy writes:

> Hi Aleksander,
> 
> > For English-speaking countries UTF-8, in majority, means ASCII, they
> > can see no difference.
> 
> I don't think that's the case.  Even North Americans, who have $ in
> ASCII, still find ` ' " " and ... cropping up, especially when services
> automatically convert ` ' " " and   And then there's L and Euro.

Well, for some people/software it is funny to use LEFT SINGLE
QUOTATION MARK instead '. But some converters are clever. Mine
treat it correctly and it took me some time to find what you
are writing about.

> 
> > As an advantage they can use foreign names like Moebius in original,
> > this makes message more readable.  But I'm afraid they wouldn't be
> > happy with message written in Russian, Chinese or Korean. 
> 
> The UTF-8 fonts on systems like Linux, and I assume Windows and Mac too,
> handle these just fine;  Cyrillic, Chinese, and Japanese spam turns up
> here daily and mhshow copes.
> 

Do not confuse message perfectly displayed and message perfectly
readable.

> > But restrict the entire nmh to utf-8 charset would cripple system.
> 
> What language/charset/locale is it that you have where UTF-8 causes
> problems?

My mental system.  Try some test. Take the file with code (any
programming language), replace all normal characters like
space, '"- etc. with their funny equivalents from utf-8. Send
this file to the programmer who works in utf-8 environment. Measure
the time until he find the reason for problems.

I do not want discuss the (dis)advantages of this or that. There
are people, who do not want work with utf-8, for better or worse
reasons. No software should enforce them to this.

Cheers, Max

-- --

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Joel Uckelman writes:

> Thus spake Oliver Kiddle:
> > 
> > The limitations occur where e-mails use characters that can't be
> > displayed in the current locale but we can't do anything about that.
> > 
> How likely is it that a message containing characters undisplayable in 
> the user's locale will be useful for the user? (This isn't meant
> rhetorically, it's a serious question.) 

This is not that simple. For years I enforced displaying iso-8859-1
charset on terminal supporting only iso-8859-2 and it works.

1. Charset declared in mail header. Quite a lot of people have
incorrectly configured charset. 

2. Language of the message. Might be different than charset suggests.
For almost any charset basic ASCII is the same, so message writen
in English would be readable.

3. Rare non-latin characters (e.g. names, cities) may enforce MUA
to switch to another charset, while the almost whole text is
readable.

On the other end, message written in (say) Japanese would be
unreadable even perfectly displayed =:-) But the same would
happen in case of message written in foreign language that use
the same charset as mine.

This is not so strong relation: supported charset => readable
message. 

   Max


-- --


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Ken Hornstein writes:
 
> There is a good chunk of code inside of nmh that assumes ASCII (in
> terms of "what is a space", "what is a newline", and other things).
> [..] 
> internal representation UTF-8.  

What do you mean by "internal representation"? Conversion from
any to utf-8, processing by the code and conversion back to the
original charset is really internal, transparent for the user.

> Now my plan
> was to convert from UTF-8 to the native character set, but that
> conversion won't be perfect.

But such the internall conversion would be perfect, no new
characters is introduced (except formatting like newlines,
spaces).

The question is: what charset will be in draft for edition?
Original, converted to something (e.g. according to locale) or
utf-8. This is no longer internal.

> 
>  I'm writing the code, I'm the one
> who makes the decisions.  [..] 
> you can give me your OPINION, 

Clear statement.

   Max

-- --

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Ken Hornstein writes:

> >*Please, no!* Conversion from any charset to utf-8 is possible, but
> >conversion back, according to user preferences, is not. People
> >start to use funny characters like non-breakable space and so on.
> 
> Unfortunately, we don't have unlimited development resources.
> 
> Here's my reading of the world:
> 
> - The general trend (especially in English-speaking countries) is to move
>   toward Unicode (specifically, UTF-8).

For English-speaking countries UTF-8, in majority, means ASCII,
they can see no difference. As an advantage they can use foreign
names like Moebius in original, this makes message more readable. 
But I'm afraid they wouldn't be happy with message written in
Russian, Chinese or Korean. 


> 
> - People in Eastern Europe aren't crazy about this.

I know, at least, one exception. =:-)


> - Given the lack of unlimited development resources, 
>   I don't really see people
>   willing to change all of the internal APIs to include character set
>   information.  That means we pretty much have to choose one character set
>   for an internal representation inside of nmh.

In fact, I know very little about API, so it might be difficult.
But restrict the entire nmh to utf-8 charset would cripple system.

Beside the information on charset inside API, from my point
of view the correct, and too much resource consumig, is move out
module for conversion outside, as separate program. The default
program would convert to utf-8, but anyone can provide his own
program for conversion according to his taste. Suppose an entry in
.mh_profile mh-text-convert: prog

This (or in similiar way) would also can handle conversion to and
from html, not only charset.

   Max

-- --

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] repl and mime handling

2012-01-18 Thread Aleksander Matuszak
Joel Uckelman writes:
 
> > > So, I have some thoughts in this direction, but I'm wondering: what do
> > > you want out of repl in terms of better MIME handling?
> > 
> > All the "text" parts turned into UTF-8 and quoted would be a good start.
> > I can then trim down in vi as normal.
> 
> Yes, please. 

*Please, no!* Conversion from any charset to utf-8 is possible, but
conversion back, according to user preferences, is not. People
start to use funny characters like non-breakable space and so on.

Problem seems to be impossible to solve not beacuse of technical
difficuly but because of very different user preferences. Much
more flexible mechanism in needed.

Any charset conversion implies problem what would be sending
charset and how to change it. 

> Even just decoding the text parts which are base64 would
> be a huge improvement. 

*Yes, please!* Now, I'm filtering messages via slocal:

default  -  pipe  ?  "/usr/bin/mimedecode|/usr/lib/mh/rcvpack
/var/mail/max -mbox"

what is not the best practice, but it works.

   max

-- --


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


[Nmh-workers] vmh and other unused files

2011-12-26 Thread Aleksander Matuszak
I am new at the list. Sorry for direct (and perhaps stupid)
question.

Does anyone can compile vmh from nmh-1.4-RC2? Form me it gives a
lot of errors.

I read at list archive that in nmh there is something like vmh.
This what I need, ncurses interface to nmh, but it is not a part
of debian nhm package and does not compile. Difficult to test...

   AM
   


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers