Re: accented characters in e-mail addresses

Markus Wiederkehr Fri, 27 Mar 2009 06:27:08 -0700

On Fri, Mar 27, 2009 at 9:52 AM, Robert Burrell Donkin
<[email protected]> wrote:
> On Fri, Mar 27, 2009 at 12:20 AM, Ondrej Bojar <[email protected]> wrote:
>> Dear Markus,
>>
>> thanks for the explanation.
>>
>> From this I understand that the bug is in the way Mime4j is called from K-9
>> (and Google's original Email client). Mime4j is meant for parsing header
>> fields as they arrive, that is following the appropriate RFC for MIME.
>> Mime4j is not intended for validation of header fields as they are presented
>> to (or in my case entered by) the user.
>
> one of the problems with the RFCs is that the IEFT working group
> actively excludes use cases like this which concern mail processing
> rather than mail transport. they have specific rules to be applied
> when streaming bytes from a socket which are often unreasonable or
> inconvenient in these cases.
>
> IMHO a good MIME library should be able to handle both. some encodings
> would be tricky but MIME headers should be 8-bit clean so UTF-8 should
> be reasonably straight forward.


I think in this case there is no need to deal with bytes or character
encodings because the encoded words have already been decoded and the
address has already been converted to a Java string.

But yes, Mime4j should be capable of parsing an address that contains
special characters in the "name" part. And I think in this case the
phrases should automatically be encoded into encoded words so that the
address may be used for transport.

We'd probably have to change AddressListParser.jj for that.. Currently
it has these rules:

void name_addr() :
{}
{
        phrase() angle_addr()
}

void phrase() :
{}
{
(       <DOTATOM>
|       <QUOTEDSTRING>
)+
}

.. which is very strict.

>> Is there a method in Mime4j to encode UTF-8 to the 'encoded word' =?...?=?
>> (I guess there is not.) Such a method would have to correctly handle *lists*
>> of 'decoded' addresses and not create e.g.
>>
>> =?ISO-8859-1?Q?Hans_=3Chans=40acme.org=3E,_Hans_M=FCller?=
>> <[email protected]>
>>
>> from
>>
>> Hans <[email protected]>, Hans Müller <[email protected]>
>
> encoding then decoding seems a little unnecessary. i think a
> configuration setting (offline mode, perhaps) allowing the header
> character set to vary would be a more elegant way to support this use
> case.

I think K-9 decodes the address in order to display it to the user and
then wants to use that decoded address when it creates the reply
message.

In this particular use case decoding and re-encoding makes sense.

Markus

Re: accented characters in e-mail addresses

Reply via email to