Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-10 Thread Brian Grayson

On Mon, Sep 09, 2002 at 09:42:21AM -0700, Michael Elkins wrote:
 Brian Grayson wrote:
I downloaded 1.4 on Friday just to see, and the same problem
  occurs.  The fundamental problem is once the CTE code sees a
  nonzero value of lobin, it goes into quoted, regardless of
  whether hibin is nonzero.  The following patch does the right
  thing for my testcase here, but I don't know if there's a good
  reason why the lobin/quotable check currently ignores whether
  there are any hibins or not.
  
After a bit of inspection, the file rep.5k has hibins and
  _no_ lobins, and hence goes properly into 8bit encoding.  But
  the file rep1k has a lobin (0x0b at offset 0x340, for example),
  so it short-circuits into quoted-printable.  Try mailing the
  base64-encoded version of that to yourself, and it should
  choose quotable, even in 1.4.
 
 Thanks for the extra info.  I looked into this more closely, and I see
 that there are a couple of factors that come into play into this
 situation.  First, I noticed that your PDF attachment was labeled
 improperly as text/plain.  This is not so bad in itself, but that
 piece of code that checks for which transfer encoding to use assumes
 that it really is text, which is a problem.  Since there was no
 extension to the file, Mutt fell back into making a guess as to whether
 or not the file was of type text/plain or appliation/octet-stream.  Mutt
 guessed text/plain because it saw only a few lobins in the file.
 However, Mutt failed to notice that there were bare CRs in the file when
 choosing the transfer encoding.  The attach patch checks info-binary
 even for the text/plain case.  I just tested this and it correctly chose
 base64 encoding for the file.

  Argggh!  I found out the fundamental problem.  It's not with
the encoding type -- quoted-printable should be fine even in
the presence of 8-bit characters (right?), except we have an Exchange
server as our mail server.  The Exchange mail server is
apparently un-encoding the quoted-printable attachment, and
then re-encoding it buggily.

  I visually verified this by telnet'ing to the SMTP port, and
cut-and-pasting a MIME mail with a quoted-printable attachment.
If I send that mail to \bgrayson, I get different results than
if the mail goes through our Exchange server.  So it appears to
me that Exchange goes into the mail message and mucks around,
and manages to also corrupt some mail while it's in there

  For example, I sent (and received from \bgrayson):
%PDF-1.2=0D%=E2=E3=CF=D3=0D=0A317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D=/H [ 
728 767 ] =0D/L 363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dend=
obj=0D xref=0D317 16 =  

  When I let Exchange touch it, I end up with:
%PDF-1.2=0D%=E2=E3=CF=D3
317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D/H [ 728 767 ] =0D/L =
363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dendobj=0D=

  So, for a solution, is there an easy way for me to tell mutt,
Never use quoted-printable because the world unfortunately has
Exchange servers?  Has anyone else seen this problem?

  Thanks.  And sorry about the wild goose chase -- I didn't
realize until now that quoted-printable should be able to
handle arbitrary binaries without corruption (at least I
_think_ it should be able to do so).

  (Microsoft just lost more respect from me.  Which is amazing,
since I didn't think there was any more to lose!)

  Brian



Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-09 Thread Brian Grayson

On Fri, Sep 06, 2002 at 11:21:09PM -0700, Michael Elkins wrote:
 Brian Grayson wrote:
Hm.  I have 1.2.5 source locally, and it looks like in
  mutt_set_encoding() in sendlib.c, the following logic may be
  faulty:
 
 I just noticed that you are using an extremely ancient version of Mutt
 (0.95).  Please try using Mutt 1.4, which is the current stable version.
 The logic for picking the CTE is much more complex now, and it should
 address your issue.

  I downloaded 1.4 on Friday just to see, and the same problem
occurs.  The fundamental problem is once the CTE code sees a
nonzero value of lobin, it goes into quoted, regardless of
whether hibin is nonzero.  The following patch does the right
thing for my testcase here, but I don't know if there's a good
reason why the lobin/quotable check currently ignores whether
there are any hibins or not.

  After a bit of inspection, the file rep.5k has hibins and
_no_ lobins, and hence goes properly into 8bit encoding.  But
the file rep1k has a lobin (0x0b at offset 0x340, for example),
so it short-circuits into quoted-printable.  Try mailing the
base64-encoded version of that to yourself, and it should
choose quotable, even in 1.4.

  Brian
-- 
Brian Grayson, SysPerf (System Performance, Modeling, and Simulation)
Somerset Design Center
Motorola
Austin, TX


--- sendlib.c   Sat Apr 20 02:25:49 2002
+++ sendlib.c.mod   Fri Sep  6 21:27:18 2002
 -1196,10 +1196,12 
   if (b-type == TYPETEXT)
   {
 char *chsname = mutt_get_body_charset (send_charset, sizeof (send_charset), b);
-if ((info-lobin  strncasecmp (chsname, iso-2022, 8)) || info-linemax  990 
|| (info-from  option (OPTENCODEFROM)))
-  b-encoding = ENCQUOTEDPRINTABLE;
-else if (info-hibin)
+if (info-hibin)
+{
   b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE;
+}
+else if ((info-lobin  strncasecmp (chsname, iso-2022, 8)) || info-linemax  
+990 || (info-from  option (OPTENCODEFROM)))
+  b-encoding = ENCQUOTEDPRINTABLE;
 else
   b-encoding = ENC7BIT;
   }



Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-09 Thread Michael Elkins

Brian Grayson wrote:
   I downloaded 1.4 on Friday just to see, and the same problem
 occurs.  The fundamental problem is once the CTE code sees a
 nonzero value of lobin, it goes into quoted, regardless of
 whether hibin is nonzero.  The following patch does the right
 thing for my testcase here, but I don't know if there's a good
 reason why the lobin/quotable check currently ignores whether
 there are any hibins or not.
 
   After a bit of inspection, the file rep.5k has hibins and
 _no_ lobins, and hence goes properly into 8bit encoding.  But
 the file rep1k has a lobin (0x0b at offset 0x340, for example),
 so it short-circuits into quoted-printable.  Try mailing the
 base64-encoded version of that to yourself, and it should
 choose quotable, even in 1.4.

Thanks for the extra info.  I looked into this more closely, and I see
that there are a couple of factors that come into play into this
situation.  First, I noticed that your PDF attachment was labeled
improperly as text/plain.  This is not so bad in itself, but that
piece of code that checks for which transfer encoding to use assumes
that it really is text, which is a problem.  Since there was no
extension to the file, Mutt fell back into making a guess as to whether
or not the file was of type text/plain or appliation/octet-stream.  Mutt
guessed text/plain because it saw only a few lobins in the file.
However, Mutt failed to notice that there were bare CRs in the file when
choosing the transfer encoding.  The attach patch checks info-binary
even for the text/plain case.  I just tested this and it correctly chose
base64 encoding for the file.


Index: sendlib.c
===
RCS file: /home/roessler/cvs/mutt/sendlib.c,v
retrieving revision 2.94.2.5
diff -u -r2.94.2.5 sendlib.c
--- sendlib.c   31 May 2002 16:59:39 -  2.94.2.5
+++ sendlib.c   9 Sep 2002 16:32:21 -
 -1196,7 +1196,17 
   if (b-type == TYPETEXT)
   {
 char *chsname = mutt_get_body_charset (send_charset, sizeof (send_charset), b);
-if ((info-lobin  strncasecmp (chsname, iso-2022, 8)) || info-linemax  990 
|| (info-from  option (OPTENCODEFROM)))
+
+/*
+ * given a lack of info about what the file is from the mime-types file,
+ * Mutt will make a guess as to whether or not the file is likely
+ * text/plain or application/octet-stream based upon statistical
+ * evidence.  It is still possible that a binary file (one with a bare
+ * CR) might occur, so we need to account for it here.
+ */
+if (info-binary)
+  b-encoding = ENCBASE64;
+else if ((info-lobin  strncasecmp (chsname, iso-2022, 8)) || info-linemax  
+990 || (info-from  option (OPTENCODEFROM)))
   b-encoding = ENCQUOTEDPRINTABLE;
 else if (info-hibin)
   b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE;



Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-09 Thread Thomas Roessler

On 2002-09-09 09:42:21 -0700, Michael Elkins wrote:

Thanks for the extra info.  I looked into this more closely, and I 
see that there are a couple of factors that come into play into  
this situation.  First, I noticed that your PDF attachment was  
labeled improperly as text/plain.  This is not so bad in itself, 
but that piece of code that checks for which transfer encoding to  
use assumes that it really is text, which is a problem.  Since  
there was no extension to the file, Mutt fell back into making a  
guess as to whether or not the file was of type text/plain or  
appliation/octet-stream.  Mutt guessed text/plain because it saw  
only a few lobins in the file. However, Mutt failed to notice that 
there were bare CRs in the file when choosing the transfer  
encoding.  The attach patch checks info-binary even for the  
text/plain case.  I just tested this and it correctly chose base64 
encoding for the file.

While the problem clearly lies with mutt mis-treating the PDF file  
as text, I'm not entirely sure that just changing  
content-transfer-encodings is the right kind of fix...  This may  
also be a problem with either the old MIME encoder (which is what I  
tend to believe), or it may come from the special-casing text mode  
has to do for CR-LF sequences (i.e., line ends), regardless of the  
content transfer encoding.  If it's the latter, there's ultimately  
nothing we can do about this - apart of not using text mode to  
transfer binary files.

In order to test things, I saved the quoted-printable version of the 
file to my hard disk, and then sent it to myself using mutt-1.4 and  
1.5 (as text/plain, with quoted-printable encoding), and then saved  
the file.  The new file was identical to the old one, down to the  
byte.

Can you guys reproduce this?  Or was I already using a mis-treated  
version of the file as my test case?

(MD5 checksum: 57631c6d362944c179c24cbe1512ce2e)

-- 
Thomas Roessler[EMAIL PROTECTED]



Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-06 Thread Brian Grayson

  I'm attaching three files:

  rep1k (quoted -- this is what came up).  This is the first 1K
of the PDF that misbehaved, and it decided wrong.

  rep.5k (8bit -- this is what came up, so it guessed right).
This is only the first 512 bytes of the PDF.

  rep1k, forced to use base64 encoding, so that you can see
what's _really_ in there, and so you can play around yourself.

  Let me know if you need more info!

  Also, for my own education, which file contains the guessing
code?

  Thanks!

  Brian
-- 
Brian Grayson, SysPerf (System Performance, Modeling, and Simulation)
[EMAIL PROTECTED]
Somerset Design Center
Motorola
Austin, TX


%PDF-1.2
%âãÏÓ
317 0 obj
 
/Linearized 1 
/O 319 
/H [ 728 767 ] 
/L 363450 
/E 62838 
/N 100 
/T 
356991 
 
endobj
 xref
317 16 
16 0 n 
000671 0 n 
001495 0 n 
001653 0 n 
001885 0 n 
001996 0 n 
002102 0 n 
002283 0 n 
002335 0 n 
004243 0 n 
004351 0 n 
004457 0 n 
061520 0 n 
061598 0 n 
000728 0 n 
001473 0 n 
trailer

/Size 333
/Info 305 0 R 
/Root 318 0 R 
/Prev 356980 
/ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3]

startxref
0
%%EOF

318 0 obj
 
/Type /Catalog 
/Pages 307 0 R 
 
endobj
331 0 obj
 /S 
1193 /Filter /FlateDecode /Length 332 0 R  
stream
H‰ÜTMhA~Éî̼]—˜h„X¨¤ADëAR¥Ö¦ÁšUB«Ùˆ`J~ŠÞ¢ibۋ6M×^ì!  {«xS/ZDÄC±hKñ§ˆX„XZ©*
Î6ݤ‰'¯vvç½÷ýÌ
Ì€`@v(‡×8ÿ…ÇÖ_ä.¶’;R‚ÊtŒœ–sdIìq.°Ý8K’ñ³kòSÅÊڈŒTqYšÀ¦i?»‡.é9   
 ùûŽ5ˆÅ3Ò.:ÃÚ%#;èá=ÇI)))°{Ää‚Ô„+4ƒõ

%PDF-1.2
%âãÏÓ
317 0 obj
 
/Linearized 1 
/O 319 
/H [ 728 767 ] 
/L 363450 
/E 62838 
/N 100 
/T 
356991 
 
endobj
 xref
317 16 
16 0 n 
000671 0 n 
001495 0 n 
001653 0 n 
001885 0 n 
001996 0 n 
002102 0 n 
002283 0 n 
002335 0 n 
004243 0 n 
004351 0 n 
004457 0 n 
061520 0 n 
061598 0 n 
000728 0 n 
001473 0 n 
traile

%PDF-1.2
%âãÏÓ
317 0 obj
 
/Linearized 1 
/O 319 
/H [ 728 767 ] 
/L 363450 
/E 62838 
/N 100 
/T 
356991 
 
endobj
 xref
317 16 
16 0 n 
000671 0 n 
001495 0 n 
001653 0 n 
001885 0 n 
001996 0 n 
002102 0 n 
002283 0 n 
002335 0 n 
004243 0 n 
004351 0 n 
004457 0 n 
061520 0 n 
061598 0 n 
000728 0 n 
001473 0 n 
trailer

/Size 333
/Info 305 0 R 
/Root 318 0 R 
/Prev 356980 
/ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3]

startxref
0
%%EOF

318 0 obj
 
/Type /Catalog 
/Pages 307 0 R 
 
endobj
331 0 obj
 /S 
1193 /Filter /FlateDecode /Length 332 0 R  
stream
H‰ÜTMhA~Éî̼]—˜h„X¨¤ADëAR¥Ö¦ÁšUB«Ùˆ`J~ŠÞ¢ibۋ6M×^ì!  {«xS/ZDÄC±hKñ§ˆX„XZ©*
Î6ݤ‰'¯vvç½÷ýÌ
Ì€`@v(‡×8ÿ…ÇÖ_ä.¶’;R‚ÊtŒœ–sdIìq.°Ý8K’ñ³kòSÅÊڈŒTqYšÀ¦i?»‡.é9   
 ùûŽ5ˆÅ3Ò.:ÃÚ%#;èá=ÇI)))°{Ää‚Ô„+4ƒõ


Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-06 Thread Brian Grayson

  Hm.  I have 1.2.5 source locally, and it looks like in
mutt_set_encoding() in sendlib.c, the following logic may be
faulty:

static void mutt_set_encoding (BODY *b, CONTENT *info)
{
   if (b-type == TYPETEXT)
   {
  if (info-lobin || info-linemax  990 || (info-from 
   option (OPTENCODEFROM)))
 b-encoding = ENCQUOTEDPRINTABLE;
  else if (info-hibin)
 b-encoding = option (OPTALLOW8BIT) ?
ENC8BIT : ENCQUOTEDPRINTABLE;
  else
 b-encoding = ENC7BIT;
   }
   ...
}

  Note that if hibin is greater than zero, but lobin is also
greater than zero, we'll use quoted-printable.

  Shouldn't it be something more like:

   if (info-hibin) {
  b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE;
   } else if (info-lobin || info-linemax  990 ||
 (info-from  option (OPTENCODEFROM)))
   {
  b-encoding = ENCQUOTEDPRINTABLE;
   }
   else
  b-encoding = ENC7BIT;
   ...

   That is, if we have 8-bit characters, don't even consider
quoted unless OPTALLOW8BIT is false.

  Brian
--
Brian Grayson, SysPerf (System Performance, Modeling, and Simulation)
[EMAIL PROTECTED]
Somerset Design Center
Motorola
Austin, TX



Re: Mutt guessing wrong encoding for outgoing PDFs?

2002-09-06 Thread Michael Elkins

Brian Grayson wrote:
   Hm.  I have 1.2.5 source locally, and it looks like in
 mutt_set_encoding() in sendlib.c, the following logic may be
 faulty:

I just noticed that you are using an extremely ancient version of Mutt
(0.95).  Please try using Mutt 1.4, which is the current stable version.
The logic for picking the CTE is much more complex now, and it should
address your issue.