Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-03 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 02:37:00PM +0900, Kenichi Asai wrote:

Yes, and it solved the problem!!!  Thank you very much!

I haven't compiled mutt by myself for long, so I rewrote the homebrew
formula to use the patch and let homebrew recompile (mutt 2.2.3).


Thank you Kenichi and Dennis for confirming the patch fixed the problem. 
That is great news!


I need to look carefully at other uses of isspace() inside Mutt too.

My final fix may be slightly different from the patch, but I will get 
this fixed for 2.2.7.  That release is overdue, so I will try to get 
that out soon.


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Dennis Preiser
On Tue, Aug 02, 2022 at 08:55:53PM -0700, Kevin J. McCarthy wrote:
> -  while (ISSPACE (*buf))
> +  while (is_email_wsp (*buf))

I was also able to reproduce the issue (on macOS) and can also confirm
that with this patch, the issue no longer occurs.

Dennis


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> Darn, that was my best guess.  After I sent the email, I even found some old
> bug reports that 0xa0 was considered "space" on MacOS (e.g.
> https://bugs.python.org/issue7072)
> 
> Still, perhaps there is something different about the way Mutt was built
> versus the quick compile.
> 
> Have you ever built Mutt yourself?  Would you be able to try, and try
> applying the attached patch to a recent release tarball?

Yes, and it solved the problem!!!  Thank you very much!

I haven't compiled mutt by myself for long, so I rewrote the homebrew
formula to use the patch and let homebrew recompile (mutt 2.2.3).

It's really great that I don't have to check Subject every time any
more to see if the last character is truncated.  Thank you Kevin and
all the others for considering and solving this problem.

Sincerely,

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 12:21:09PM +0900, Kenichi Asai wrote:

asai@bigsur % cat test.c
#include 
#include 

int main ()
{
 printf("%d\n", isspace(0xa0));
 printf("%d\n", isspace(0x85));
 printf("%d\n", isspace(0x0a));

 return 0;
}

[...]

asai@bigsur % ./test
0
0
1
asai@bigsur %


Darn, that was my best guess.  After I sent the email, I even found some 
old bug reports that 0xa0 was considered "space" on MacOS 
(e.g. https://bugs.python.org/issue7072)


Still, perhaps there is something different about the way Mutt was built 
versus the quick compile.


Have you ever built Mutt yourself?  Would you be able to try, and try 
applying the attached patch to a recent release tarball?


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA
From c2457b789989353db176ca3f899e838d9c67c0c2 Mon Sep 17 00:00:00 2001
From: Kevin McCarthy 
Date: Tue, 2 Aug 2022 20:51:17 -0700
Subject: [PATCH] wip: testing rfcline reader fix.

---
 parse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parse.c b/parse.c
index 7ed1668f..95dea4bf 100644
--- a/parse.c
+++ b/parse.c
@@ -72,7 +72,7 @@ char *mutt_read_rfc822_line (FILE *f, char *line, size_t *linelen)
 if (*buf == '\n')
 {
   /* we did get a full line. remove trailing space */
-  while (ISSPACE (*buf))
+  while (is_email_wsp (*buf))
 	*buf-- = 0;	/* we cannot come beyond line's beginning because
 			 * it begins with a non-space */
 
-- 
2.35.1



signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> Would you mind creating a script to use for $editor.  Something like:
> 
> - - - - myeditor.sh - - - -
>   #!/bin/bash
> 
>   cp $1 ~/before.txt
>   vim $1
>   cp $1 ~/after.txt
> - - - - end myeditor.sh - - -
> 
> set editor = "~/myeditor.sh"
> 
> Then, if you put 加 at the end of the subject while in Mutt.  I'd like to
> see what before.txt and after.txt look like after vim runs.  Please attach
> them so I can look at the raw bytes.

Attached.  They were identical, both containing 加.

> Also, if you have access to a compiler, would you mind compiling and running
> this quick program:

Here you go:

asai@bigsur % cat test.c
#include 
#include 

int main ()
{
  printf("%d\n", isspace(0xa0));
  printf("%d\n", isspace(0x85));
  printf("%d\n", isspace(0x0a));

  return 0;
}
asai@bigsur % gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
asai@bigsur % gcc -o test test.c
asai@bigsur % ./test
0
0
1
asai@bigsur %

-- 
Kenichi Asai
From: Kenichi Asai 
To: a...@is.ocha.ac.jp
Cc: 
Bcc: 
Subject: 加
Reply-To: 


-- 
お茶大・理・情報   浅井 健一   a...@is.ocha.ac.jp
From: Kenichi Asai 
To: a...@is.ocha.ac.jp
Cc: 
Bcc: 
Subject: 加
Reply-To: 


-- 
お茶大・理・情報   浅井 健一   a...@is.ocha.ac.jp


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> > I try to send this e-mail out in 7bit mode (with υ at the end of
> > Subject).
> 
> Why would you do that when the discussion seems to be about UTF-8
> glyphs ?  I'm curious.

I just thought that quoted printable did some work on the Subject
line, but I was wrong.

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 09:53:42AM +0900, Kenichi Asai wrote:

* In the step:
  "- enter some e-mail address and a subject."
if instead, you put a 加 at the end of the subject here, before running vim,
does 加 show up in vim?


Yes.


If you then don't modify the subject while still in
vim, does it show up in Mutt?


No.  It becomes the replacement character.


Thank you for taking the time to answer all my questions.

The subject line you sent appears to be truncated by one byte.  This 
also occurred for the 加 emails earlier.


Would you mind creating a script to use for $editor.  Something like:

- - - - myeditor.sh - - - -
  #!/bin/bash

  cp $1 ~/before.txt
  vim $1
  cp $1 ~/after.txt
- - - - end myeditor.sh - - -

set editor = "~/myeditor.sh"

Then, if you put 加 at the end of the subject while in Mutt.  I'd like 
to see what before.txt and after.txt look like after vim runs.  Please 
attach them so I can look at the raw bytes.


Also, if you have access to a compiler, would you mind compiling and 
running this quick program:


- - - - test.c - - - -
#include 
#include 

int main ()
{
  printf("%d\n", isspace(0xa0));
  printf("%d\n", isspace(0x85));
  printf("%d\n", isspace(0x0a));

  return 0;
}
- - - - end test.c - - - -

% gcc -o test test.c
% ./test

Thank you again.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 : �

2022-08-02 Thread Kenichi Asai
> > This e-mail has υ at the end of Subject.  I will send it out.
> 
> Somehow, the previous e-mail did not contain the replacement character
> at the end of Subject.  I don't know why.  Because the e-mail was
> quoted perhaps?
> 
> [text/plain, quoted, utf-8, 1.3K]
> 
> Bastian's e-mail is 7bit (as is my original e-mail):
> 
> [text/plain, 7bit, us-ascii, 0.7K]
> 
> I try to send this e-mail out in 7bit mode (with υ at the end of
> Subject).

The replacement characters in my previous e-mails do not show up in
mutt's pager (while the one in Bastian's e-mail does; I don't know
why), but if I see those e-mails in gmail, the replacement characters
do appear to exist.

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.: �

2022-08-02 Thread Kenichi Asai
> This e-mail has υ at the end of Subject.  I will send it out.

Somehow, the previous e-mail did not contain the replacement character
at the end of Subject.  I don't know why.  Because the e-mail was
quoted perhaps?

[text/plain, quoted, utf-8, 1.3K]

Bastian's e-mail is 7bit (as is my original e-mail):

[text/plain, 7bit, us-ascii, 0.7K]

I try to send this e-mail out in 7bit mode (with υ at the end of
Subject).

> * Does the problem occur if you use a different editor?

Yes.  The same thing happens with:

set editor="emacs %s"

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.: �

2022-08-02 Thread Kenichi Asai
Thank you all for considering this issue.

> * After you finish the above steps, what happens if you edit the email again
> in vim?  Does the 加 show up at the end of the subject in vim the second time?

No.  In vim, the character becomes ?? (two question marks).

> * With your original steps, what does the email look like if you send directly
> after returning from vim?

The replacement character is sent as seen in Bastian's e-mail:

Date: Tue, 2 Aug 2022 12:39:06 +0200
From: Bastian 
To: mutt-users@mutt.org
Subject: Re: Subject that ends with UTF-8, 85 or A0 ��

(I am not quite sure why there are two replacement characters here
instead of one.)

> Would you mind sending me an email so I can take a look?

This e-mail has υ at the end of Subject.  I will send it out.

> * Does the problem occur if you use a different editor?

Let me come back to this later.  I need to send this e-mail out to try
a different editor.

> * In the step:
>   "- enter some e-mail address and a subject."
> if instead, you put a 加 at the end of the subject here, before running vim,
> does 加 show up in vim?

Yes.

> If you then don't modify the subject while still in
> vim, does it show up in Mutt?

No.  It becomes the replacement character.

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.: υ

2022-08-02 Thread Paul Gilmartin via Mutt-users

On 8/2/22 13:58:37, Kevin J. McCarthy wrote:

On Tue, Aug 02, 2022 at 03:52:51PM +0900, Kenichi Asai wrote:

When the subject ends with a character whose last byte in
UTF-8 is either 85 or A0, it appears the character collapses. >
I'm having trouble duplicating this problem on Debian Testing.  So I'll 
need help debugging, from you or anyone else who can duplicate the problem.



Send it to the list, as I am doing with Thunderbird.
The more eyeball the better.

See: .

--
gil


Re: Subject that ends with UTF-8, 85 or A0

2022-08-02 Thread Kevin J. McCarthy

On Tue, Aug 02, 2022 at 03:52:51PM +0900, Kenichi Asai wrote:

When the subject ends with a character whose last byte in
UTF-8 is either 85 or A0, it appears the character collapses.
To reproduce:

- prepare .mutt/muttrc containing only the following line:
 set edit_headers=yes
- launch mutt and type m to create a new mail.
- enter some e-mail address and a subject.
- vim launches.
- edit Subject line so that it ends with a character such as:
 υ or % or e (whose last byte of their UTF-8 code is 85) or
 ム or 加 (whose last byte of their UTF-8 code is A0).
- type :wq to quit vim.
- the last character is converted to something else.


Hi Kenichi,

I'm having trouble duplicating this problem on Debian Testing.  So I'll 
need help debugging, from you or anyone else who can duplicate the 
problem.


* After you finish the above steps, what happens if you edit the email 
again in vim?  Does the 加 show up at the end of the subject in vim the 
second time?


* With your original steps, what does the email look like if you send 
directly after returning from vim?  Would you mind sending me an email 
so I can take a look?


* Does the problem occur if you use a different editor?

* In the step:
  "- enter some e-mail address and a subject."
if instead, you put a 加 at the end of the subject here, before running 
vim, does 加 show up in vim?  If you then don't modify the subject while 
still in vim, does it show up in Mutt?


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 �

2022-08-02 Thread Dennis Preiser
On Tue, Aug 02, 2022 at 11:17:14PM +0700, Đoàn Trần Công Danh wrote:
> On 2022-08-02 13:07:42+0200, Dennis Preiser  wrote:
>> On Tue, Aug 02, 2022 at 12:39:06PM +0200, Bastian wrote:
>> > Maybe some local vim/encoding issues on darwin?
>> 
>> Maybe. Interestingly, if I change the subject via 's' before sending in
>> mutt and replace the two replacement characters with 0x52a0 it works.
> 
> I noticed that your email was send with Content-type's charset is
> us-ascii and Kenichi's email is iso-2022-jp.
> 
> Is it part of the problem? I.e. Editors think the text files is
> an us-ascii or iso-2022-jp encoded file, and convert non-interpretable
> characters to utf-8 replacement?

I don't think so. If the editor thinks the file is us-ascii, then it
must not insert a UTF-8 replacement character.

Furthermore, this only happens if the character in question is at the
end of the subject. If it is at the beginning or somewhere in the middle
of the subject everything is fine. In other words, if the subject
consists of two 0x52a0 (Subject: 加加), the first one remains intact and
the second one is replaced by the UTF-8 replacement character.

Dennis


Re: Subject that ends with UTF-8, 85 or A0 加

2022-08-02 Thread Đoàn Trần Công Danh
On 2022-08-02 13:07:42+0200, Dennis Preiser  wrote:
> On Tue, Aug 02, 2022 at 12:39:06PM +0200, Bastian wrote:
> > I see it in the subject now.
> > There are two U+FFFD chars.
> > 
> >> | System: Darwin 21.6.0 (arm64)
> >> | ncurses: ncurses 6.3.20220625 (compiled with 6.3)
> >> | libiconv: 1.16
> >> | hcache backend: lmdb LMDB 0.9.70: (December 19, 2015)
> >> | 
> >> | Compiler:
> >> | Apple clang version 13.1.6 (clang-1316.0.21.2.5)
> >> | Target: arm64-apple-darwin21.5.0
> >> | Thread model: posix
> >> | [...]
> > 
> > Maybe some local vim/encoding issues on darwin?
> 
> Maybe. Interestingly, if I change the subject via 's' before sending in
> mutt and replace the two replacement characters with 0x52a0 it works.

I noticed that your email was send with Content-type's charset is
us-ascii and Kenichi's email is iso-2022-jp.

Is it part of the problem? I.e. Editors think the text files is
an us-ascii or iso-2022-jp encoded file, and convert non-interpretable
characters to utf-8 replacement?


-- 
Danh


Re: Subject that ends with UTF-8, 85 or A0 加

2022-08-02 Thread Dennis Preiser
On Tue, Aug 02, 2022 at 12:39:06PM +0200, Bastian wrote:
> I see it in the subject now.
> There are two U+FFFD chars.
> 
>> | System: Darwin 21.6.0 (arm64)
>> | ncurses: ncurses 6.3.20220625 (compiled with 6.3)
>> | libiconv: 1.16
>> | hcache backend: lmdb LMDB 0.9.70: (December 19, 2015)
>> | 
>> | Compiler:
>> | Apple clang version 13.1.6 (clang-1316.0.21.2.5)
>> | Target: arm64-apple-darwin21.5.0
>> | Thread model: posix
>> | [...]
> 
> Maybe some local vim/encoding issues on darwin?

Maybe. Interestingly, if I change the subject via 's' before sending in
mutt and replace the two replacement characters with 0x52a0 it works.

Dennis


Re: Subject that ends with UTF-8, 85 or A0 ��

2022-08-02 Thread Bastian
> > I can't reproduce either.
> 
> I can reproduce the issue. In vim the character 0x52a0 is still present:
> 
> 
> 
> After quitting vim, mutt displays the unicode replacement character
> 0xfffd instead of 0x52a0:

I see it in the subject now.
There are two U+FFFD chars.

> | System: Darwin 21.6.0 (arm64)
> | ncurses: ncurses 6.3.20220625 (compiled with 6.3)
> | libiconv: 1.16
> | hcache backend: lmdb LMDB 0.9.70: (December 19, 2015)
> | 
> | Compiler:
> | Apple clang version 13.1.6 (clang-1316.0.21.2.5)
> | Target: arm64-apple-darwin21.5.0
> | Thread model: posix
> | [...]

Maybe some local vim/encoding issues on darwin?

-- 
Bastian


Re: Subject that ends with UTF-8, 85 or A0 �

2022-08-02 Thread Dennis Preiser
On Tue, Aug 02, 2022 at 04:04:07PM +0700, Đoàn Trần Công Danh wrote:
> On 2022-08-02 10:06:08+0200, Bastian  wrote:
>> On 02Aug22 15:52+0900, Kenichi Asai wrote:
>> > Would it be possible to somehow avoid this problem?  I cannot avoid
>> > creating e-mails with Japanese characters in Subject and this problem
>> > bugs me quite much.
>> 
>> I think I am not able to reproduce this behavior. I appended 0x52A0 to 
>> the subject line. Can you verify?
> 
> I can't reproduce either.

I can reproduce the issue. In vim the character 0x52a0 is still present:



After quitting vim, mutt displays the unicode replacement character
0xfffd instead of 0x52a0:



Finally, in the e-mail as such this character has completely
disappeared:



| % mutt -v
| Mutt 2.2.6 (2022-06-05)
| Copyright (C) 1996-2022 Michael R. Elkins and others.
| Mutt comes with ABSOLUTELY NO WARRANTY; for details type `mutt -vv'.
| Mutt is free software, and you are welcome to redistribute it
| under certain conditions; type `mutt -vv' for details.
| 
| System: Darwin 21.6.0 (arm64)
| ncurses: ncurses 6.3.20220625 (compiled with 6.3)
| libiconv: 1.16
| hcache backend: lmdb LMDB 0.9.70: (December 19, 2015)
| 
| Compiler:
| Apple clang version 13.1.6 (clang-1316.0.21.2.5)
| Target: arm64-apple-darwin21.5.0
| Thread model: posix
| [...]

Dennis


Re: Subject that ends with UTF-8, 85 or A0 加

2022-08-02 Thread Bastian
On 02Aug22 16:04+0700, Đoàn Trần Công Danh wrote:
> On 2022-08-02 10:06:08+0200, Bastian  wrote:
> > υ 0x3C5
> > % 0xFF05
> > e 0xFF45
> > ム 0x30E0
> > 加 0x52A0
> > 
> > So only the last matches your description 'last byte is A0'
> 
> I think he meant the last byte of their utf-8 representation:
> Which means: υ 0x3C5 is 0xCF 0x85 which is ended with 0x85.

Thanks. TIL some more about utf8. When the pointer is ontop of a 
cahracter, vim shows me the character value as '0x3C5', which is the
'unicode code point'. On the utf8 chartable it is noted as 
U+03C5 and the according utf8 hex is 0xcf85.

-- 
Bastian


Re: Subject that ends with UTF-8, 85 or A0 加

2022-08-02 Thread Đoàn Trần Công Danh
On 2022-08-02 10:06:08+0200, Bastian  wrote:
> On 02Aug22 15:52+0900, Kenichi Asai wrote:
> > - prepare .mutt/muttrc containing only the following line:
> >   set edit_headers=yes
> > - launch mutt and type m to create a new mail.
> > - enter some e-mail address and a subject.
> > - vim launches.
> > - edit Subject line so that it ends with a character such as:
> >   υ or % or e (whose last byte of their UTF-8 code is 85) or
> >   ム or 加 (whose last byte of their UTF-8 code is A0).
> 
> The mail I received shows the characters you gave here as examples as:
> 
> υ 0x3C5
> % 0xFF05
> e 0xFF45
> ム 0x30E0
> 加 0x52A0
> 
> So only the last matches your description 'last byte is A0'

I think he meant the last byte of their utf-8 representation:
Which means: υ 0x3C5 is 0xCF 0x85 which is ended with 0x85.


> I cannot see them currently, my font does not support them. But anyways 
> I couldn't verify if these would be the characters you intended to send.
> 
> > Would it be possible to somehow avoid this problem?  I cannot avoid
> > creating e-mails with Japanese characters in Subject and this problem
> > bugs me quite much.
> 
> I think I am not able to reproduce this behavior. I appended 0x52A0 to 
> the subject line. Can you verify?

I can't reproduce either.

-- 
Danh


Re: Subject that ends with UTF-8, 85 or A0 加

2022-08-02 Thread Bastian
On 02Aug22 15:52+0900, Kenichi Asai wrote:
> - prepare .mutt/muttrc containing only the following line:
>   set edit_headers=yes
> - launch mutt and type m to create a new mail.
> - enter some e-mail address and a subject.
> - vim launches.
> - edit Subject line so that it ends with a character such as:
>   υ or % or e (whose last byte of their UTF-8 code is 85) or
>   ム or 加 (whose last byte of their UTF-8 code is A0).

The mail I received shows the characters you gave here as examples as:

υ 0x3C5
% 0xFF05
e 0xFF45
ム 0x30E0
加 0x52A0

So only the last matches your description 'last byte is A0'
I cannot see them currently, my font does not support them. But anyways 
I couldn't verify if these would be the characters you intended to send.

> Would it be possible to somehow avoid this problem?  I cannot avoid
> creating e-mails with Japanese characters in Subject and this problem
> bugs me quite much.

I think I am not able to reproduce this behavior. I appended 0x52A0 to 
the subject line. Can you verify?


Cheers,
-- 
Bastian


Subject that ends with UTF-8, 85 or A0

2022-08-02 Thread Kenichi Asai
When the subject ends with a character whose last byte in
UTF-8 is either 85 or A0, it appears the character collapses.
To reproduce:

- prepare .mutt/muttrc containing only the following line:
  set edit_headers=yes
- launch mutt and type m to create a new mail.
- enter some e-mail address and a subject.
- vim launches.
- edit Subject line so that it ends with a character such as:
  υ or % or e (whose last byte of their UTF-8 code is 85) or
  ム or 加 (whose last byte of their UTF-8 code is A0).
- type :wq to quit vim.
- the last character is converted to something else.

Would it be possible to somehow avoid this problem?  I cannot avoid
creating e-mails with Japanese characters in Subject and this problem
bugs me quite much.

I am using Mutt 2.2.3 (2022-04-12) installed from homebrew on MacOS
Big Sur 11.6.7 on iMac (2019).

Sincerely,

-- 
Kenichi Asai