Re: Mutt guessing wrong encoding for outgoing PDFs?
On Mon, Sep 09, 2002 at 09:42:21AM -0700, Michael Elkins wrote: Brian Grayson wrote: I downloaded 1.4 on Friday just to see, and the same problem occurs. The fundamental problem is once the CTE code sees a nonzero value of lobin, it goes into quoted, regardless of whether hibin is nonzero. The following patch does the right thing for my testcase here, but I don't know if there's a good reason why the lobin/quotable check currently ignores whether there are any hibins or not. After a bit of inspection, the file rep.5k has hibins and _no_ lobins, and hence goes properly into 8bit encoding. But the file rep1k has a lobin (0x0b at offset 0x340, for example), so it short-circuits into quoted-printable. Try mailing the base64-encoded version of that to yourself, and it should choose quotable, even in 1.4. Thanks for the extra info. I looked into this more closely, and I see that there are a couple of factors that come into play into this situation. First, I noticed that your PDF attachment was labeled improperly as text/plain. This is not so bad in itself, but that piece of code that checks for which transfer encoding to use assumes that it really is text, which is a problem. Since there was no extension to the file, Mutt fell back into making a guess as to whether or not the file was of type text/plain or appliation/octet-stream. Mutt guessed text/plain because it saw only a few lobins in the file. However, Mutt failed to notice that there were bare CRs in the file when choosing the transfer encoding. The attach patch checks info-binary even for the text/plain case. I just tested this and it correctly chose base64 encoding for the file. Argggh! I found out the fundamental problem. It's not with the encoding type -- quoted-printable should be fine even in the presence of 8-bit characters (right?), except we have an Exchange server as our mail server. The Exchange mail server is apparently un-encoding the quoted-printable attachment, and then re-encoding it buggily. I visually verified this by telnet'ing to the SMTP port, and cut-and-pasting a MIME mail with a quoted-printable attachment. If I send that mail to \bgrayson, I get different results than if the mail goes through our Exchange server. So it appears to me that Exchange goes into the mail message and mucks around, and manages to also corrupt some mail while it's in there For example, I sent (and received from \bgrayson): %PDF-1.2=0D%=E2=E3=CF=D3=0D=0A317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D=/H [ 728 767 ] =0D/L 363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dend= obj=0D xref=0D317 16 = When I let Exchange touch it, I end up with: %PDF-1.2=0D%=E2=E3=CF=D3 317 0 obj=0D =0D/Linearized 1 =0D/O 319 =0D/H [ 728 767 ] =0D/L = 363450 =0D/E 62838 =0D/N 100 =0D/T 356991 =0D =0Dendobj=0D= So, for a solution, is there an easy way for me to tell mutt, Never use quoted-printable because the world unfortunately has Exchange servers? Has anyone else seen this problem? Thanks. And sorry about the wild goose chase -- I didn't realize until now that quoted-printable should be able to handle arbitrary binaries without corruption (at least I _think_ it should be able to do so). (Microsoft just lost more respect from me. Which is amazing, since I didn't think there was any more to lose!) Brian
Re: Mutt guessing wrong encoding for outgoing PDFs?
On Fri, Sep 06, 2002 at 11:21:09PM -0700, Michael Elkins wrote: Brian Grayson wrote: Hm. I have 1.2.5 source locally, and it looks like in mutt_set_encoding() in sendlib.c, the following logic may be faulty: I just noticed that you are using an extremely ancient version of Mutt (0.95). Please try using Mutt 1.4, which is the current stable version. The logic for picking the CTE is much more complex now, and it should address your issue. I downloaded 1.4 on Friday just to see, and the same problem occurs. The fundamental problem is once the CTE code sees a nonzero value of lobin, it goes into quoted, regardless of whether hibin is nonzero. The following patch does the right thing for my testcase here, but I don't know if there's a good reason why the lobin/quotable check currently ignores whether there are any hibins or not. After a bit of inspection, the file rep.5k has hibins and _no_ lobins, and hence goes properly into 8bit encoding. But the file rep1k has a lobin (0x0b at offset 0x340, for example), so it short-circuits into quoted-printable. Try mailing the base64-encoded version of that to yourself, and it should choose quotable, even in 1.4. Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) Somerset Design Center Motorola Austin, TX --- sendlib.c Sat Apr 20 02:25:49 2002 +++ sendlib.c.mod Fri Sep 6 21:27:18 2002 -1196,10 +1196,12 if (b-type == TYPETEXT) { char *chsname = mutt_get_body_charset (send_charset, sizeof (send_charset), b); -if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax 990 || (info-from option (OPTENCODEFROM))) - b-encoding = ENCQUOTEDPRINTABLE; -else if (info-hibin) +if (info-hibin) +{ b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; +} +else if ((info-lobin strncasecmp (chsname, iso-2022, 8)) || info-linemax +990 || (info-from option (OPTENCODEFROM))) + b-encoding = ENCQUOTEDPRINTABLE; else b-encoding = ENC7BIT; }
gbnet.net [was Re: After-editing hook?]
On Fri, Sep 06, 2002 at 04:28:27AM -0700, David T-G wrote: Jeff -- BTW, you should use the @mutt.org address for the mutt-users list rather than the @gbnet address. Yes, the gbnet address leaks out now and again (I don't really know how, but think it might be digest-related), but we're trying to get it squashed once and for all. I think I sent my post to gbnet.net because once I subscribed, the welcome message mentioned [EMAIL PROTECTED], and so I just sent to [EMAIL PROTECTED], since I'm used to majordomo-run lists. Someone might want to tweak the welcome message to remove all references to [EMAIL PROTECTED] and [EMAIL PROTECTED] if you truly want to hide that domain name. Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX
Re: Mutt guessing wrong encoding for outgoing PDFs?
I'm attaching three files: rep1k (quoted -- this is what came up). This is the first 1K of the PDF that misbehaved, and it decided wrong. rep.5k (8bit -- this is what came up, so it guessed right). This is only the first 512 bytes of the PDF. rep1k, forced to use base64 encoding, so that you can see what's _really_ in there, and so you can play around yourself. Let me know if you need more info! Also, for my own education, which file contains the guessing code? Thanks! Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n trailer /Size 333 /Info 305 0 R /Root 318 0 R /Prev 356980 /ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3] startxref 0 %%EOF 318 0 obj /Type /Catalog /Pages 307 0 R endobj 331 0 obj /S 1193 /Filter /FlateDecode /Length 332 0 R stream HÜTMhA~Éî̼]hX¨¤ADëAR¥Ö¦ÁUB«Ù`J~Þ¢ibÛ6M×^ì! {«xS/ZDÄC±hKñ§XXZ©* Î6ݤ'¯vvç½÷ýÌ Ì `@v(×8ÿ ÇÖ_ä.¶;RÊtsdIìq.°Ý8Kñ³kòSÅÊÚTqYÀ¦i?».é9î· ùû5Å3Ò.:ÃÚ%#;èá=ÇI)))°{ÄäÔ+4õ %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n traile %PDF-1.2 %âãÏÓ 317 0 obj /Linearized 1 /O 319 /H [ 728 767 ] /L 363450 /E 62838 /N 100 /T 356991 endobj xref 317 16 16 0 n 000671 0 n 001495 0 n 001653 0 n 001885 0 n 001996 0 n 002102 0 n 002283 0 n 002335 0 n 004243 0 n 004351 0 n 004457 0 n 061520 0 n 061598 0 n 000728 0 n 001473 0 n trailer /Size 333 /Info 305 0 R /Root 318 0 R /Prev 356980 /ID[f420f46189f89a9d08ec59e2f57273f3f420f46189f89a9d08ec59e2f57273f3] startxref 0 %%EOF 318 0 obj /Type /Catalog /Pages 307 0 R endobj 331 0 obj /S 1193 /Filter /FlateDecode /Length 332 0 R stream HÜTMhA~Éî̼]hX¨¤ADëAR¥Ö¦ÁUB«Ù`J~Þ¢ibÛ6M×^ì! {«xS/ZDÄC±hKñ§XXZ©* Î6ݤ'¯vvç½÷ýÌ Ì `@v(×8ÿ ÇÖ_ä.¶;RÊtsdIìq.°Ý8Kñ³kòSÅÊÚTqYÀ¦i?».é9î· ùû5Å3Ò.:ÃÚ%#;èá=ÇI)))°{ÄäÔ+4õ
Re: Mutt guessing wrong encoding for outgoing PDFs?
Hm. I have 1.2.5 source locally, and it looks like in mutt_set_encoding() in sendlib.c, the following logic may be faulty: static void mutt_set_encoding (BODY *b, CONTENT *info) { if (b-type == TYPETEXT) { if (info-lobin || info-linemax 990 || (info-from option (OPTENCODEFROM))) b-encoding = ENCQUOTEDPRINTABLE; else if (info-hibin) b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; else b-encoding = ENC7BIT; } ... } Note that if hibin is greater than zero, but lobin is also greater than zero, we'll use quoted-printable. Shouldn't it be something more like: if (info-hibin) { b-encoding = option (OPTALLOW8BIT) ? ENC8BIT : ENCQUOTEDPRINTABLE; } else if (info-lobin || info-linemax 990 || (info-from option (OPTENCODEFROM))) { b-encoding = ENCQUOTEDPRINTABLE; } else b-encoding = ENC7BIT; ... That is, if we have 8-bit characters, don't even consider quoted unless OPTALLOW8BIT is false. Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX
Mutt guessing wrong encoding for outgoing PDFs?
I didn't see this in the FAQ or in a search of the archives, so just point me to the right spot if this is an FAQ that I missed somehow. When sending some PDFs, mutt is incorrectly guessing that 'quoted printable' is sufficient -- the PDF in question doesn't contain 8-bit characters in the first several dozen lines, and I'm guessing mutt only scans the first several before making its choice? Using the wrong encoding causes CRLF etc. to be munged deep in the 8-bit characters, leading to corrupted PDFs. Is there any way to control mutt's behavior to say 'always send PDF files as base64', sort of like a reverse mailcap, or to make it check more thoroughly? Thanks! Brian -- Brian Grayson, SysPerf (System Performance, Modeling, and Simulation) [EMAIL PROTECTED] Somerset Design Center Motorola Austin, TX