Re: Mutt guessing wrong encoding for outgoing PDFs?
On Mon, Sep 09, 2002 at 09:42:21AM -0700, Michael Elkins wrote: Brian Grayson wrote: I downloaded 1.4 on Friday just to see, and the same problem occurs. The fundamental problem is once the CTE code sees a nonzero value of lobin, it goes into quoted, regardless of whether hibin is nonzero. The following patch does the right thing for my testcase here, but I don't know if there's a good reason why the lobin/quotable check currently ignores whether there are any hibins or not. After a bit of inspection, the file rep.5k has hibins and _no_ lobins, and hence goes properly into 8bit encoding. But the file rep1k has a lobin (0x0b at offset 0x340, for example), so it short-circuits into quoted-printable. Try mailing the base64-encoded version of that to yourself, and it should choose quotable, even in 1.4. Thanks for the extra info. I looked into this more closely, and I see that there are a couple of factors that come into play into this situation. First, I noticed that your PDF attachment was labeled improperly as text/plain. This is not so bad in itself, but that piece of code that checks for which transfer encoding to use assumes that it really is text, which is a problem. Since there was no extension to the file, Mutt fell back into making a guess as to whether or not the file was of type text/plain or appliation/octet-stream. Mutt guessed text/plain because it saw only a few lobins in the file. However, Mutt failed to notice that there were bare CRs in the file when choosing the transfer encoding. The attach patch checks info-binary even for the text/plain case. I just tested this and it correctly chose base64 encoding for the file. Argggh! I found out the fundamental problem. It's not with the encoding type -- quoted-printable should be fine even in the presence of 8-bit characters (right?), except we have an Exchange server as our mail server. The Exchange mail server is apparently un-encoding the quoted-printable attachment, and then re-encoding it buggily. I visually verified this by telnet'ing to the SMTP port, and cut-and-pasting a MIME mail with a quoted-printable attachment. If I send that mail to \bgrayson, I get different results than if the mail goes through our Exchange server. So it appears to me that Exchange goes into the mail message and mucks around, and manages to also corrupt some mail while it's in there For example, I sent (and received from \bgrayson): %PDF-1.2=0D%=E2=E3=CF=D3=0D=0A317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D=/H [ 728 767 ] =0D/L 363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dend= obj=0D xref=0D317 16 = When I let Exchange touch it, I end up with: %PDF-1.2=0D%=E2=E3=CF=D3 317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D/H [ 728 767 ] =0D/L = 363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dendobj=0D= So, for a solution, is there an easy way for me to tell mutt, Never use quoted-printable because the world unfortunately has Exchange servers? Has anyone else seen this problem? Thanks. And sorry about the wild goose chase -- I didn't realize until now that quoted-printable should be able to handle arbitrary binaries without corruption (at least I _think_ it should be able to do so). (Microsoft just lost more respect from me. Which is amazing, since I didn't think there was any more to lose!) Brian
Re: Mutt guessing wrong encoding for outgoing PDFs?
On Fri, Sep 06, 2002 at 11:21:09PM -0700, Michael Elkins wrote: Brian Grayson wrote: Hm. I have 1.2.5 source locally, and it looks like in mutt_set_encoding() in sendlib.c, the following logic may be faulty: I just noticed that you are using an extremely ancient version of Mutt (0.95). Please try using Mutt 1.4, which is the current stable version. The logic for picking the CTE is much more complex now, and it should address your issue. I downloaded 1.4 on Friday just to see, and the same problem occurs. The fundamental problem is once the CTE code sees a nonzero value of lobin, it goes into quoted, regardless of whether hibin is nonzero. The following patch does the right thing for my testcase here, but I don't know if there's a good reason why the lobin/quotable check currently ignores whether there are any hibins or not. After a bit of inspection, the file rep.5k has hibins and _no_ lobins, and hence goes properly into 8bit encoding. But the file rep1k has a lobin (0x0b at offset 0x340, for example), so it short-circuits into quoted-printable. Try mailing the base64-encoded version of that to yourself, and it should choose quotable, even in 1.4. Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) Somerset Design Center Motorola Austin, TX --- sendlib.c Sat Apr 20 02:25:49 2002 +++ sendlib.c.mod Fri Sep 6 21:27:18 2002 -1196,10 +1196,12 if (b-type == TYPETEXT) { char *chsname = mutt_get_body_charset (send_charset, sizeof (send_charset), b); -if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax 990 || (info-from option (OPTENCODEFROM))) - b-encoding = ENCQUOTEDPRINTABLE; -else if (info-hibin) +if (info-hibin) +{ b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; +} +else if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax +990 || (info-from option (OPTENCODEFROM))) + b-encoding = ENCQUOTEDPRINTABLE; else b-encoding = ENC7BIT; }
Re: Mutt guessing wrong encoding for outgoing PDFs?
Brian Grayson wrote: I downloaded 1.4 on Friday just to see, and the same problem occurs. The fundamental problem is once the CTE code sees a nonzero value of lobin, it goes into quoted, regardless of whether hibin is nonzero. The following patch does the right thing for my testcase here, but I don't know if there's a good reason why the lobin/quotable check currently ignores whether there are any hibins or not. After a bit of inspection, the file rep.5k has hibins and _no_ lobins, and hence goes properly into 8bit encoding. But the file rep1k has a lobin (0x0b at offset 0x340, for example), so it short-circuits into quoted-printable. Try mailing the base64-encoded version of that to yourself, and it should choose quotable, even in 1.4. Thanks for the extra info. I looked into this more closely, and I see that there are a couple of factors that come into play into this situation. First, I noticed that your PDF attachment was labeled improperly as text/plain. This is not so bad in itself, but that piece of code that checks for which transfer encoding to use assumes that it really is text, which is a problem. Since there was no extension to the file, Mutt fell back into making a guess as to whether or not the file was of type text/plain or appliation/octet-stream. Mutt guessed text/plain because it saw only a few lobins in the file. However, Mutt failed to notice that there were bare CRs in the file when choosing the transfer encoding. The attach patch checks info-binary even for the text/plain case. I just tested this and it correctly chose base64 encoding for the file. Index: sendlib.c === RCS file: /home/roessler/cvs/mutt/sendlib.c,v retrieving revision 2.94.2.5 diff -u -r2.94.2.5 sendlib.c --- sendlib.c 31 May 2002 16:59:39 - 2.94.2.5 +++ sendlib.c 9 Sep 2002 16:32:21 - -1196,7 +1196,17 if (b-type == TYPETEXT) { char *chsname = mutt_get_body_charset (send_charset, sizeof (send_charset), b); -if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax 990 || (info-from option (OPTENCODEFROM))) + +/* + * given a lack of info about what the file is from the mime-types file, + * Mutt will make a guess as to whether or not the file is likely + * text/plain or application/octet-stream based upon statistical + * evidence. It is still possible that a binary file (one with a bare + * CR) might occur, so we need to account for it here. + */ +if (info-binary) + b-encoding = ENCBASE64; +else if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax +990 || (info-from option (OPTENCODEFROM))) b-encoding = ENCQUOTEDPRINTABLE; else if (info-hibin) b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE;
Re: Mutt guessing wrong encoding for outgoing PDFs?
On 2002-09-09 09:42:21 -0700, Michael Elkins wrote: Thanks for the extra info. I looked into this more closely, and I see that there are a couple of factors that come into play into this situation. First, I noticed that your PDF attachment was labeled improperly as text/plain. This is not so bad in itself, but that piece of code that checks for which transfer encoding to use assumes that it really is text, which is a problem. Since there was no extension to the file, Mutt fell back into making a guess as to whether or not the file was of type text/plain or appliation/octet-stream. Mutt guessed text/plain because it saw only a few lobins in the file. However, Mutt failed to notice that there were bare CRs in the file when choosing the transfer encoding. The attach patch checks info-binary even for the text/plain case. I just tested this and it correctly chose base64 encoding for the file. While the problem clearly lies with mutt mis-treating the PDF file as text, I'm not entirely sure that just changing content-transfer-encodings is the right kind of fix... This may also be a problem with either the old MIME encoder (which is what I tend to believe), or it may come from the special-casing text mode has to do for CR-LF sequences (i.e., line ends), regardless of the content transfer encoding. If it's the latter, there's ultimately nothing we can do about this - apart of not using text mode to transfer binary files. In order to test things, I saved the quoted-printable version of the file to my hard disk, and then sent it to myself using mutt-1.4 and 1.5 (as text/plain, with quoted-printable encoding), and then saved the file. The new file was identical to the old one, down to the byte. Can you guys reproduce this? Or was I already using a mis-treated version of the file as my test case? (MD5 checksum: 57631c6d362944c179c24cbe1512ce2e) -- Thomas Roessler[EMAIL PROTECTED]
Re: Mutt guessing wrong encoding for outgoing PDFs?
I'm attaching three files: rep1k (quoted -- this is what came up). This is the first 1K of the PDF that misbehaved, and it decided wrong. rep.5k (8bit -- this is what came up, so it guessed right). This is only the first 512 bytes of the PDF. rep1k, forced to use base64 encoding, so that you can see what's _really_ in there, and so you can play around yourself. Let me know if you need more info! Also, for my own education, which file contains the guessing code? Thanks! Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n trailer /Size 333 /Info 305 0 R /Root 318 0 R /Prev 356980 /ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3] startxref 0 %%EOF 318 0 obj /Type /Catalog /Pages 307 0 R endobj 331 0 obj /S 1193 /Filter /FlateDecode /Length 332 0 R stream HÜTMhA~Éî̼]hX¨¤ADëAR¥Ö¦ÁUB«Ù`J~Þ¢ibÛ6M×^ì! {«xS/ZDÄC±hKñ§XXZ©* Î6ݤ'¯vvç½÷ýÌ Ì `@v(×8ÿ ÇÖ_ä.¶;RÊtsdIìq.°Ý8Kñ³kòSÅÊÚTqYÀ¦i?».é9î· ùû5Å3Ò.:ÃÚ%#;èá=ÇI)))°{ÄäÔ+4õ %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n traile %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n trailer /Size 333 /Info 305 0 R /Root 318 0 R /Prev 356980 /ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3] startxref 0 %%EOF 318 0 obj /Type /Catalog /Pages 307 0 R endobj 331 0 obj /S 1193 /Filter /FlateDecode /Length 332 0 R stream HÜTMhA~Éî̼]hX¨¤ADëAR¥Ö¦ÁUB«Ù`J~Þ¢ibÛ6M×^ì! {«xS/ZDÄC±hKñ§XXZ©* Î6ݤ'¯vvç½÷ýÌ Ì `@v(×8ÿ ÇÖ_ä.¶;RÊtsdIìq.°Ý8Kñ³kòSÅÊÚTqYÀ¦i?».é9î· ùû5Å3Ò.:ÃÚ%#;èá=ÇI)))°{ÄäÔ+4õ
Re: Mutt guessing wrong encoding for outgoing PDFs?
Hm. I have 1.2.5 source locally, and it looks like in mutt_set_encoding() in sendlib.c, the following logic may be faulty: static void mutt_set_encoding (BODY *b, CONTENT *info) { if (b-type == TYPETEXT) { if (info-lobin || info-linemax 990 || (info-from option (OPTENCODEFROM))) b-encoding = ENCQUOTEDPRINTABLE; else if (info-hibin) b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; else b-encoding = ENC7BIT; } ... } Note that if hibin is greater than zero, but lobin is also greater than zero, we'll use quoted-printable. Shouldn't it be something more like: if (info-hibin) { b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; } else if (info-lobin || info-linemax 990 || (info-from option (OPTENCODEFROM))) { b-encoding = ENCQUOTEDPRINTABLE; } else b-encoding = ENC7BIT; ... That is, if we have 8-bit characters, don't even consider quoted unless OPTALLOW8BIT is false. Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX
Re: Mutt guessing wrong encoding for outgoing PDFs?
Brian Grayson wrote: Hm. I have 1.2.5 source locally, and it looks like in mutt_set_encoding() in sendlib.c, the following logic may be faulty: I just noticed that you are using an extremely ancient version of Mutt (0.95). Please try using Mutt 1.4, which is the current stable version. The logic for picking the CTE is much more complex now, and it should address your issue.