Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-03 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 02:37:00PM +0900, Kenichi Asai wrote:

Yes, and it solved the problem!!!  Thank you very much!

I haven't compiled mutt by myself for long, so I rewrote the homebrew
formula to use the patch and let homebrew recompile (mutt 2.2.3).


Thank you Kenichi and Dennis for confirming the patch fixed the problem. 
That is great news!


I need to look carefully at other uses of isspace() inside Mutt too.

My final fix may be slightly different from the patch, but I will get 
this fixed for 2.2.7.  That release is overdue, so I will try to get 
that out soon.


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Dennis Preiser
On Tue, Aug 02, 2022 at 08:55:53PM -0700, Kevin J. McCarthy wrote:
> -  while (ISSPACE (*buf))
> +  while (is_email_wsp (*buf))

I was also able to reproduce the issue (on macOS) and can also confirm
that with this patch, the issue no longer occurs.

Dennis


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> Darn, that was my best guess.  After I sent the email, I even found some old
> bug reports that 0xa0 was considered "space" on MacOS (e.g.
> https://bugs.python.org/issue7072)
> 
> Still, perhaps there is something different about the way Mutt was built
> versus the quick compile.
> 
> Have you ever built Mutt yourself?  Would you be able to try, and try
> applying the attached patch to a recent release tarball?

Yes, and it solved the problem!!!  Thank you very much!

I haven't compiled mutt by myself for long, so I rewrote the homebrew
formula to use the patch and let homebrew recompile (mutt 2.2.3).

It's really great that I don't have to check Subject every time any
more to see if the last character is truncated.  Thank you Kevin and
all the others for considering and solving this problem.

Sincerely,

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 12:21:09PM +0900, Kenichi Asai wrote:

asai@bigsur % cat test.c
#include 
#include 

int main ()
{
 printf("%d\n", isspace(0xa0));
 printf("%d\n", isspace(0x85));
 printf("%d\n", isspace(0x0a));

 return 0;
}

[...]

asai@bigsur % ./test
0
0
1
asai@bigsur %


Darn, that was my best guess.  After I sent the email, I even found some 
old bug reports that 0xa0 was considered "space" on MacOS 
(e.g. https://bugs.python.org/issue7072)


Still, perhaps there is something different about the way Mutt was built 
versus the quick compile.


Have you ever built Mutt yourself?  Would you be able to try, and try 
applying the attached patch to a recent release tarball?


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA
From c2457b789989353db176ca3f899e838d9c67c0c2 Mon Sep 17 00:00:00 2001
From: Kevin McCarthy 
Date: Tue, 2 Aug 2022 20:51:17 -0700
Subject: [PATCH] wip: testing rfcline reader fix.

---
 parse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parse.c b/parse.c
index 7ed1668f..95dea4bf 100644
--- a/parse.c
+++ b/parse.c
@@ -72,7 +72,7 @@ char *mutt_read_rfc822_line (FILE *f, char *line, size_t *linelen)
 if (*buf == '\n')
 {
   /* we did get a full line. remove trailing space */
-  while (ISSPACE (*buf))
+  while (is_email_wsp (*buf))
 	*buf-- = 0;	/* we cannot come beyond line's beginning because
 			 * it begins with a non-space */
 
-- 
2.35.1



signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> Would you mind creating a script to use for $editor.  Something like:
> 
> - - - - myeditor.sh - - - -
>   #!/bin/bash
> 
>   cp $1 ~/before.txt
>   vim $1
>   cp $1 ~/after.txt
> - - - - end myeditor.sh - - -
> 
> set editor = "~/myeditor.sh"
> 
> Then, if you put 加 at the end of the subject while in Mutt.  I'd like to
> see what before.txt and after.txt look like after vim runs.  Please attach
> them so I can look at the raw bytes.

Attached.  They were identical, both containing 加.

> Also, if you have access to a compiler, would you mind compiling and running
> this quick program:

Here you go:

asai@bigsur % cat test.c
#include 
#include 

int main ()
{
  printf("%d\n", isspace(0xa0));
  printf("%d\n", isspace(0x85));
  printf("%d\n", isspace(0x0a));

  return 0;
}
asai@bigsur % gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
asai@bigsur % gcc -o test test.c
asai@bigsur % ./test
0
0
1
asai@bigsur %

-- 
Kenichi Asai
From: Kenichi Asai 
To: a...@is.ocha.ac.jp
Cc: 
Bcc: 
Subject: 加
Reply-To: 


-- 
お茶大・理・情報   浅井 健一   a...@is.ocha.ac.jp
From: Kenichi Asai 
To: a...@is.ocha.ac.jp
Cc: 
Bcc: 
Subject: 加
Reply-To: 


-- 
お茶大・理・情報   浅井 健一   a...@is.ocha.ac.jp


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kenichi Asai
> > I try to send this e-mail out in 7bit mode (with υ at the end of
> > Subject).
> 
> Why would you do that when the discussion seems to be about UTF-8
> glyphs ?  I'm curious.

I just thought that quoted printable did some work on the Subject
line, but I was wrong.

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.:

2022-08-02 Thread Kevin J. McCarthy

On Wed, Aug 03, 2022 at 09:53:42AM +0900, Kenichi Asai wrote:

* In the step:
  "- enter some e-mail address and a subject."
if instead, you put a 加 at the end of the subject here, before running vim,
does 加 show up in vim?


Yes.


If you then don't modify the subject while still in
vim, does it show up in Mutt?


No.  It becomes the replacement character.


Thank you for taking the time to answer all my questions.

The subject line you sent appears to be truncated by one byte.  This 
also occurred for the 加 emails earlier.


Would you mind creating a script to use for $editor.  Something like:

- - - - myeditor.sh - - - -
  #!/bin/bash

  cp $1 ~/before.txt
  vim $1
  cp $1 ~/after.txt
- - - - end myeditor.sh - - -

set editor = "~/myeditor.sh"

Then, if you put 加 at the end of the subject while in Mutt.  I'd like 
to see what before.txt and after.txt look like after vim runs.  Please 
attach them so I can look at the raw bytes.


Also, if you have access to a compiler, would you mind compiling and 
running this quick program:


- - - - test.c - - - -
#include 
#include 

int main ()
{
  printf("%d\n", isspace(0xa0));
  printf("%d\n", isspace(0x85));
  printf("%d\n", isspace(0x0a));

  return 0;
}
- - - - end test.c - - - -

% gcc -o test test.c
% ./test

Thank you again.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Subject that ends with UTF-8, 85 or A0 e.g.: �

2022-08-02 Thread Kenichi Asai
> This e-mail has υ at the end of Subject.  I will send it out.

Somehow, the previous e-mail did not contain the replacement character
at the end of Subject.  I don't know why.  Because the e-mail was
quoted perhaps?

[text/plain, quoted, utf-8, 1.3K]

Bastian's e-mail is 7bit (as is my original e-mail):

[text/plain, 7bit, us-ascii, 0.7K]

I try to send this e-mail out in 7bit mode (with υ at the end of
Subject).

> * Does the problem occur if you use a different editor?

Yes.  The same thing happens with:

set editor="emacs %s"

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.: �

2022-08-02 Thread Kenichi Asai
Thank you all for considering this issue.

> * After you finish the above steps, what happens if you edit the email again
> in vim?  Does the 加 show up at the end of the subject in vim the second time?

No.  In vim, the character becomes ?? (two question marks).

> * With your original steps, what does the email look like if you send directly
> after returning from vim?

The replacement character is sent as seen in Bastian's e-mail:

Date: Tue, 2 Aug 2022 12:39:06 +0200
From: Bastian 
To: mutt-users@mutt.org
Subject: Re: Subject that ends with UTF-8, 85 or A0 ��

(I am not quite sure why there are two replacement characters here
instead of one.)

> Would you mind sending me an email so I can take a look?

This e-mail has υ at the end of Subject.  I will send it out.

> * Does the problem occur if you use a different editor?

Let me come back to this later.  I need to send this e-mail out to try
a different editor.

> * In the step:
>   "- enter some e-mail address and a subject."
> if instead, you put a 加 at the end of the subject here, before running vim,
> does 加 show up in vim?

Yes.

> If you then don't modify the subject while still in
> vim, does it show up in Mutt?

No.  It becomes the replacement character.

-- 
Kenichi Asai


Re: Subject that ends with UTF-8, 85 or A0 e.g.: υ

2022-08-02 Thread Paul Gilmartin via Mutt-users

On 8/2/22 13:58:37, Kevin J. McCarthy wrote:

On Tue, Aug 02, 2022 at 03:52:51PM +0900, Kenichi Asai wrote:

When the subject ends with a character whose last byte in
UTF-8 is either 85 or A0, it appears the character collapses. >
I'm having trouble duplicating this problem on Debian Testing.  So I'll 
need help debugging, from you or anyone else who can duplicate the problem.



Send it to the list, as I am doing with Thunderbird.
The more eyeball the better.

See: .

--
gil