Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
On Fri, May 08, 2009 at 04:34:14PM -0700, zion wrote:
> Well, I just captured smtp session of loopback interface (same box where
> mutt is running). Here is the relevant part:
>   03d0: 746f 3e38 353c 2f74 6f3e 0d0a 0909 093c  to>85.<
>   03e0: 7265 6164 3e21 d091 e288 9ae2 9591 3c2f  read>!п.Б..Б..   
>   03f0: 7265 6164 3e0d 0a09 0909 3c77 7269 7465  read>. 
> As you can see, this character is already messed up before reaching
> server. So, @gmail is not guilty here ;-).
Turns out it's my locale. having this causes the problem:
LC_CTYPE=ru_RU.KOI8-R
if LC_CTYPE is unset, file doesn't get corrupted.


Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
Well, I just captured smtp session of loopback interface (same box where
mutt is running). Here is the relevant part:
  03d0: 746f 3e38 353c 2f74 6f3e 0d0a 0909 093c  to>85.<
  03e0: 7265 6164 3e21 d091 e288 9ae2 9591 3c2f  read>!п.Б..Б...

Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
On Fri, May 08, 2009 at 06:04:42PM -0500, Kyle Wheeler wrote:
> On Friday, May  8 at 03:00 PM, quoth Aaron S.:
> > I have a mystery that I'm trying to solve to no avail.
> 
> Hopefully we can help!
> 
> > I got a little sample XML (utf-8) encoded file that I'm trying to 
> > send as attachment. When I attach it, mutt correctly identifies it: 
> > [text/plain, 8bit, utf-8, 0.3K], since there are non-ASCII 
> > characters, in this case there is only 1 such character.
> 
> Well, actually, that's an incorrect identification. It's NOT a 
> text/plain file, it's an xml file. According to RFC 3023, it should 
> either be sent as application/xml or as text/xml.
> 
> Now, that misidentification shouldn't cause the problem you're having, 
> but correcting it *probably* will fix the problem. I bet that if you 
> add the following to your ~/.mime.types file, the problem goes away:
> 
>  application/xml jff
> 
> > After I send it, this attached file becomes currupt.
> 
> I tried sending your file to myself, both with and without that line 
> in my mime.types file, and the file didn't get corrupted either way.
> 
> My guess is that this is ACTUALLY your mail server's fault (did you 
> send it through an MSFT Exchange server maybe? They're really bad 
> about this). Here's what I think happened: you have configured mutt to 
> send things in 8-bit mode (i.e. $allow_8bit). Thus, when sending a 
> utf-8 file attachment with an unusual character in it, mutt sent it 
> completely unmodified, because that's supposed to be safe to do when 
> sending in 8-bit mode. But some servers (and I've had this happen more 
> often than not with Exchange servers) attempt to convert all messages 
> into 7-bit form. Unfortunately, they're often very bad at it. I've had 
> several messages corrupted by Exchange servers simply because they 
> couldn't handle curly-quotes correctly. It's happened often enough 
> that I finally just unset allow_8bit so that mutt would always take 
> care of encoding my messages in a 7-bit safe manner, because mutt is 
> so much better at it than they are.
> 
> Anyway, does that help?
Hello,
Well, it was sent from @gmail to another @gmail
account. I have no idea what they run there at google. I thought about
adding that to mime.types and it does work.
What bothers me is that now I have to pay much closer attention as to
when I'm attaching strange files.
I'm gonna have to think of a way to intercept whatever mutt is sending
out to make sure it's not mutt that messes up this 3byte UTF-8
character.

In any case, thanks for your pointers.