[Freeipa-devel] git patch email issues

John Dennis Fri, 12 Feb 2010 14:45:37 -0800

I spent the better part of today investigating why we appear to haveproblems with our patch emails when the contents is are 7-bit ASCII.I'be been through the source code of git-format-patch andgit-send-email, refreshed my memory of various RFC's, and have performeda number of experiments.


The Details:
------------

If an email, or any mime part of an email does *not* specify aContent-Type with a charset parameter then the encoding defaults to7-bit US-ASCII.

That hasn't been a problem in the past because virtually all our patcheshave been restricted to 7-bit ASCII so we never really noticed theproblem. However more recently we been sending files with UTF-8 encodedvalues and we started to see what appeared to be corruption in thepatch. This was most noticeable when the mail passed through Mailman,some versions of which attempt to transcode the email to match the listpreferences.


Here is what is actually going on:

git-format-patch does *not* set the charset when it formats the email.Without a charset specified anybody handling the email according to theRFC's are supposed to treat the body as 7-bit ASCII. Thus patches whichcontain UTF-8 characters outside the range of 7-bit ASCII will have thepotential to be mangled because the content (8-bit UTF-8) does not matchthe content declaration (7-bit ASCII).

You can instruct git-format-patch to add arbitrary email headers. Thuswe can force git-format-patch to provide the correct contentdeclaration. This is best done by adding a format parameter to your~/.gitconfig like this:


[format]

headers = "Content-Type: text/plain;charset=\"utf-8\"\nContent-Transfer-Encoding: 8bit\n"

When you do this the headers at the top of your formatted patch willinclude:


Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit

While that's a good start *it's not enough* Why?

Those extra email headers have to actually make it into the emailheaders being sent via SMTP. Just because those header lines are sittingin a file which might be attached to the email you're sending is notsufficient. Why? Because the patch being attached becomes part of theemail body, the headers in the patch file never make it into the actualSMTP email headers.

Sending a patch as an attachment prevents the Content-Type from beinginserted into the headers of the email, that's why when we send patchesas an attachment they get their UTF-8 content screwed up.

So, is there a solution which causes the headers inserted bygit-format-patch to become part of the actual email headers? Yes, it'scalled git-send-email. git-send-email is designed to send a git patchand as such it knows how to parse a patch formatted by git-format-patch.One of the things it does is look for email headers in thegit-format-patch file and inserts them into the SMTP headers for the email.


What happens when you send a patch as an attachment?

This causes the email body to become a collection of mime multiparts.Each mime part *should* have it's own Content-Type declaration.Unfortunately neither "git-send-email --attach" nor Thunderbird whenattaching a patch set the charset parameter of the Content-Typedeclaration for the patch content. Remember the rules of the variousRFC's, if the charset parameter is absent the encoding of the content isto be interpreted as 7-bit ASCII. So any of our patches which containUTF-8 can be mangled because we've violated the rules of email. We sentsomething which we implicitly declared was 7-bit ASCII but was actually8-bit UTF-8!

I could find no way to make Thunderbird use a specific Content-Type whensending a patch file. git-send-email with the attach option has theContent-Type hardcoded which I consider a bug. Unfortunately I couldn'tfind a git bug reporting tool to report this bug.

If you have git-format-patch add the Content-Type and use git-send-emailto send the patch it will be *correct*. Why? Because git-send-email willinsert the Content-Type into the SMTP header which will apply to the*entire* email body because there are *no* mime multipart's as therewould be if it were an attachment. If you do that not only will thepatch be sent correctly, but it will also display correctly in youremail reader!

But isn't there another way to send the patch without it getting it'scharset screwed up? Yes, you can send the patch as binary data (e.g.base64 encoded), which implies it must be an attachment.

What happens when you base64 encode the attachment? Basically it meansto the mail handling components along the way "keep your grubby handsoff, do not try to interpret this". That's both good and bad for us.It's good that the UTF-8 encoded patch makes it through the mail systemunscathed, but mail readers have no clue how to properly display it, infact they won't even try. That means you can't read the patch in yourmail reader, you'll have to save it which invokes the base64 decoding,then you can open it as a file.

But wait a minute! I've seen base64 encoded patches on this email listand I can read the patch. You're lying! Well, that might be because ofthis email Stephen sent out a while ago.


> Stephen Gallagher wrote:
>

The latest versions of git (including that shipped with Fedora 12)
has some trouble parsing patch files sent through mailman that are
encoded as "Content-Type: text/plain;"

Thunderbird can be made to send all attachments in base64-encoded
form (which should be safe for mailman) by changing the following
settings.

In Thunderbird Preferences, go to the Advanced->General tab. Select
"Config Editor"

Search for mail.file_attach_binary and set this value to true.

Now all of your attachments will be base64-encoded. Yes, it increases
the filesize somewhat, but accuracy > bandwidth.

So what's going on in this case? Let's follow the steps. You start byattaching a patch file, whose Content-Type is correctly determined to be"text/x-patch", then it's base64 encoded and sent as an attachment. Onthe receiving end the mime part containing the patch has these headers:


Content-Type: text/x-patch;
Content-Transfer-Encoding: base64

So the mail reader (Thunderbird in my instance) sees this wastransferred as base64 and decodes it. Then it looks at the Content-Typeand sees that it's text (but *without* the charset parameter). So themail reader displays it as 7-bit ASCII. O.K. I lied a little.Thunderbird actually has a configuration which specifies the defaultcharset if one is not found, which defaults to ISO-8859-1, this issomewhat in violation of the RFC's but ISO-8859-1 has become a practicaldefault in practice. So in this case Thunderbird tries to display theUTF-8 encoded patch as ISO-8859-1 text. This is a *display problemonly*, the actual data is correct because it was sent as base64 and thuswas never mucked with, but Thunderbird is displaying it incorrectly.

There is a manual work around for this in Thunderbird. If you're lookinga patch which looks like it's rendered with the wrong encoding (e.g.charset) then go to the View --> Character Encoding menu and select UTF-8.

Let's go back a minute to Stephen's assertion that Mailman is screwingup the patches and we need to have Thunderbird base64 encode them toprevent mailman from mucking with them. This really isn't a mailmanproblem, rather it's our problem with how we're sending patches. We'velied to mailman and all the other components which might handle the mailalong the way. We told those mail systems the body of the mail was 7-bitASCII (because we omitted the charset parameter in the mail header) andthen inserted 8-bit UTF-8 into the mail body. That kind of a lie won'tbite you until one of the mail components decides to transcode the mailbody. One of the features of mailman is transcoding to match the list'sencoding preference. The fact mailman corrupted the mail body is notmailman's fault because we lied to mailman about the encoding of themail body, the old saying holds true "garbage in; garbage out".


O.K. so what our options here?

1) Continue to send patches the way we have making sure Thunderbird isconfigured to base64 encode them. Accept the fact that when displayed ina mail reader any UTF-8 will be garbled and you have to manually forceThunderbird to render the patch in UTF-8. The contents of the patchremains uncorrupted, it's just a display issue in the mail reader.

2) Configure git-send-email to add the correct SMTP headers and usegit-send-email. This is probably preferred because it's actually correctfrom an RFC standpoint.

Option 2 is actually pretty easy to use. My ~/.gitconfig is set up likethis:


[sendemail]
        smtpserver = smtp.corp.redhat.com
        to = freeipa-devel@redhat.com
        from = John Dennis <jden...@redhat.com>
        confirm = never
[format]

headers = "Content-Type: text/plain;charset=\"utf-8\"\nContent-Transfer-Encoding: 8bit\n"

Those defaults in my .gitconfig means I never have to add any commandline args to either git-format-patch or git-send-email, it's as easy as:


% git format-patch -1
% git send-email 0001-some-patch-file

The downside of using git-send-email is whoever is applying the patchwill have to save the entire email to a file instead of an attachment,which might be slightly more awkward. But as you can see from above it'svery hard, and in most cases impossible, to get a patch sent as anattachment to have the correct charset specified. This is a prettyserious shortcoming and calls into question the use of attachments inthe first place.


--
John Dennis <jden...@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

[Freeipa-devel] git patch email issues

Reply via email to