>> The Gmail and Hotmail support handles other people's UTF-8 addresses
>> in mail but they still don't provide UTF-8 addresses on their own
>> systems.
>
> From what I can tell, Gmail and outlook.com's support is basically "just send
> UTF-8", that is, it will send EAI messages without the server offering the
> extension.

I know the people involved and can check.

> I agree that this isn't difficult. What's difficult is keeping track of the
> EAI-ness of a message as it goes through processing like alias expansion, 
which
> can turn an non-EAI message into an EAI message or vice versa.

> Support for the nested encodings message/global creates may also be
> nontrivial.

I don't even try.  In the places where it matters, I scan the envelope and
message headers for characters with the high bit set.

This isn't a place where you care about message/global, since the presence
of a message/global object doesn't make the message an EAI message. Indeed,
the only good thing about message/global is that it presents as an opaque
container that allows you to tuck one EAI message inside another without
requiring transport level EAI processing.

But this doesn't mean that MTAs, or more precisely MSAs, can get away with
not processing message/global. Message fixups on SUBMIT need to reach
inside, and since some of them may be legally mandated, it's not something
we can ignore.

As for checking for high bits in the outer header, that ignores
utf-8 in nested MIME parts. It isn't clear to me what the consequences of
that are going to be.

This is wrong, but
it doesn't seem much wronger than far more complex approaches.  Haven't
thought too much about message/global but in the MTAs I use, it's only a
MUA problem.

MTAs, maybe. But your typical MTA also acts as an MSA.

>> The hardest part, which I haven't done yet, is generalizing
>> the address mapping that MTAs do on incoming mail. ...

> This I frankly don't care about, as I believe that doing it in a meaningful
> language-specific way is impossible.

I meant interpreting addresses in mail to my own mailboxes, the
generalized version of case folding and subaddresses.  Maybe you're right
that undotted i's won't work in a lot of places, but I'd be surprised if
they didn't work in Turkey.

First of all, undotted i's are going to work everywhere as long as you don't
expect to be able to switch case. We're only talking about having things work
when someone makes a case change when entering an address.

And in such cases, prepare to be surprised, because the odds are astromonically
small that it will work. Even in Turkey.

It's possible - I think - to code a solution where different case conversions
are applied on a per-domain basis. And ISPs/MSPs might even be willing to
run such software preferentially and configure it correctly for
their own domains.

But what about other Turkish domains? Is someone going to maintain a list of
them all so that address matching can be done properly? And will it be kept up to date?

And even if this all came to pass, do per-domain case rules actually work?
Are domain aligned this way? I'm fairly sure the answer to this question is
"no".

And while considering all of this, keep in mind that that standards punted on
this issue. There's no requirement that any normalization be done, let alone
provide case-insensitivity.

Besides, I have a sneaking suspicions that those who take the step of offering
addresses in scripts that have these issues are going to be so busy dealing
with visual similarity, address fakery, and similar issues that we'll be lucky
if they do any sort of normalization at all, let alone dive deep into
the rabbit hole.

                                Ned


_______________________________________________
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop

Reply via email to