On 1/24/11 4:52 PM, Mark Martinec wrote:
What MySQL makes of such data is up to the MySQL client and server
libraries, but Postfix does not promise that the input will be well-formed
UTF-8, or ISO Latin or anything of the sort. Just an array of bytes.
Right, as it should be. Envelope addresses are not associated with any
character set according to RFC 5321, they are just strings of octets.

Urgh. Which RFC are you reading ?

I quote:

Systems MUST NOT define mailboxes in such a way as to require the use
   in SMTP of non-ASCII characters (octets with the high order bit set
   to one) or ASCII "control characters" (decimal value 0-31 and 127).
   These characters MUST NOT be used in MAIL or RCPT commands or other
   commands that require mailbox names.


"MUST BE 7-bit clean ASCII" sounds like a definite encoding to me.

If you must, and both SMTP client and sender can handle it, parts of the conversation may be MIME-encoded.
Envelope information is not one of those parts.


For this reason an appropriate SQL data type for such fields is
VARBINARY (or BYTEA in PostgreSQL).  A data type CHAR or VARCHAR
is inappropriate, as it associates a character set with data.  SQL may
perform validation of data according to the specified character set.
MySQL tend to be quite premissive to such violations, but there is
no guarantee. Also, comparing CHAR or VARCHAR strings with
relational operators is case-insensitive and may even apply
special (like Unicode) rules for character equivalency.

   Mark


--
J.

Reply via email to