Let me first say that I have been googling, testing, tcpdumping, and in-general-studying for the past 2 days trying to get my autoresponder to display something other than question marks when it encountered multi-byte characters. In the end, I appear to have solved my problem, but I wanted to go through the discovery process, soliciting information from others who may have devised a better solution.
Problem: customer sets his autoresponder to have foreign language characters (Simplified Chinese if it matters) in the subject field and the message field of an autoresponder for their mailbox. When the autoresponse happens, it replaces all non-ASCII characters with a question mark. 1. From the mysql command prompt, the same query returns the correct characters, so I knew the query was ok. At this point, I happened to google for just the right thing and found a ML post from Philip Hazel which said (authoritatively) that exim will not "convert" anything that it receives from mysql. 2. I figured I needed to set some headers so that MUAs would render it properly. I googled and found an old ML post about autoresponders and the MIME headers that were added. I added them to mine but it still made absolutely zero difference. 3. I did a tcpdump of the data being transmitted from the database server to the mail server. It was actually question marks, not the characters. So it was mysql who was converting the multi-byte characters to question marks before being sent to exim. Turns out mysql defaults to latin1 character set if you don't specify otherwise, so I need to specify the character set. More googling and I found this illuminating message: http://www.gossamer-threads.com/lists/exim/users/86382 This showed me that I only needed to tell the mysql connection that it was to use UTF-8 since that's the format we store data in our database. 4. Our system already has default character set utf-8 in mysql on all machines, so settings in my.cnf were already present and didn't solve any issues for me. 5. I changed the query to be like the above ML post specifying two sequential lookups. It worked, but then revealed another problem: Expansion of "=${extract{subject}{${lookup mysql{SET NAMES utf8}{}}${lookup mysql {SELECT ea.active,ea.frequency,ea.subject,ea.message FROM email_autoresponder AS ea JOIN email AS e ON e.autoresponder_id=ea.id JOIN domain AS d ON d.id=e.domain_id WHERE e.name='${quote_mysql:${local_part}}' AND d.name='${quote_mysql:${domain}}' AND e.active=1 AND e.email_type_id=1 } }}{$value}{Auto-Response: ${quote:${local_part}}@${quote:${domain}} will respond to your email soon}}" in autoresponder_always_t transport contains non-printing character 232 6. I found another post by Philip Hazel with a one line fix for the above: http://www.mail-archive.com/exim-users@exim.org/msg13613.html But using the rfc2047 operator didn't work. It did something to my value that made everything default to the default text. I don't quite understand what failed, and frankly didn't throw much brainpower at it, as by this time I had developed an alternative approach. 7. Perl. It's weird how perl is just the solution to everything for me. A single perl function with one variable passed to (the field) that makes a DBI call to get the autoresponse data for the user for both the subject and text. 8. The subject field complained about the non-printing character, so wrapped it in the rfc2047 operator. Now everything works and renders properly. Yay! The perl function I ended up with is: use DBI; sub get_autoresponder_data { my $field = shift(); my $default = shift(); my $local_part = lc(Exim::expand_string('$local_part')); my $domain = lc(Exim::expand_string('$domain')); my $hostname = 'localhost'; my $dbh = DBI->connect("dbi:mysql:ivenue:$hostname",'censored','censored'); return($default) unless ($dbh->ping); $dbh->do('SET NAMES utf8'); my $query = "SELECT ea.$field FROM email_autoresponder AS ea " . "JOIN email AS e ON e.autoresponder_id=ea.id " . "JOIN domain AS d ON d.id=e.domain_id " . "WHERE e.name=" . $dbh->quote($local_part) . " AND " . "d.name=" . $dbh->quote($domain) . " AND " . "e.active=1 AND " . "e.email_type_id=1"; my $result = $dbh->selectall_arrayref( $query ); $dbh->disconnect(); if ($result->[0]) { return( $result->[0]->[0] ); } return($default); } Then a few macros: AUTORESPONDER_TEXT = ${perl{get_autoresponder_data}{message}{AUTORESPONDER_DEFAULT_TEXT}} AUTORESPONDER_SUBJECT = ${rfc2047:${perl{get_autoresponder_data}{subject}{AUTORESPONDER_DEFAULT_SUBJECT}}} The transport looks like this: autoresponder_always_t: driver = autoreply log = USER_MAILDIR/vacation.log from = ${quote_local_part:$local_part}\@${quote:$domain} to = AUTORESPONDER_TO text = AUTORESPONDER_TEXT subject = AUTORESPONDER_SUBJECT headers = "Precedence: junk\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: 8bit" user = MAIL_USER group = MAIL_GROUP If there any any comments or suggestions about my process above (or even "this should have worked" type comments), I would appreciate hearing about it. I just wanted to put something in the ML archives that detailed both the discovery process and the solution of how I was able to get multi-byte characters (in subject and message body) to work in my autoresponder system. -- Regards... Todd I seek the truth...it is only persistence in self-delusion and ignorance that does harm. -- Marcus Aurealius -- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/