Hi all,

For a very long time postgresql users have complained about charset
incompatibilities in dbmail.

Typically errors like "invalid byte sequence for encoding ..."  would
break delivery of messages when the string being inserted in the
database was differently encoded than postgres was expecting for the
column involved.

For the messageblks table this was resolved by converting the messageblk
column to the BYTEA type.

However, I don't want to do that for dbmail_headervalue.headervalue,
dbmail_subjectfield.subjectfield for two reasons: I want to be able to
use these two fields reliably for sorting, and I don't know how BYTEA
columns behave when indexed. Also, this is the kind of change I am most
reluctant to make so shortly before a major release.

So, I came up with a solution I want to play by you.

I've just landed a change that will convert all strings inserted into
the headervalue and subjectfield columns into UTF8 encoded strings using
a gmime's iconv facilities. The subject and address parts of the
envelope are encoded as utf7 (rfc2047) also makes it safe to insert them
into utf8 tables regardsless of the original charset encoding.

Pro:
- we don't need to alter the schema.
- imap-sort behaves as expected.

Con:
- this means starting from 2.2.0 dbmail expects tables to use utf8 encoding.

I havent tested yet how this new behavious affects people using non-utf8
 encoded tables, like latin1 or koi8. People with experience in these
matters are invited to speak up.

Also: is there a procedure to change the encoding on a table in postgresql?

-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl

Reply via email to