Re: [Dbmail-dev] Re: PostgreSQL + non-ASCII encoding

Robert Fleming Tue, 1 Nov 2005 10:50:34 +0100 (CET)

Thomas,

Is the expense concern the computational complexity (escaping for SQL),the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondarystorage use (not exploiting PostgreSQL's text compression)? Perhaps thefirst two of these could be addressed by using PostgreSQL'sparameterized query execution functions. As for secondary storage use,my understanding is that both "text" and "bytea" can be storedcompressed and uncompressed (at the operator's discretion).

The patch I sent before was indeed sub-optimal w.r.t. performance, butfortunately was optimal w.r.t. my level of effort, portable among dbmailrevisions, and adequate w.r.t. my performance requirements (I am my DB'ssole user ;).

Regarding message body searching: it seems to me that searching a "text"column has the same complexities as searching "bytea". In both cases,it would be necessary to transcode the search string (or the messageblock) to matching encodings. (As you mentioned, switching entirelyUTF-8 would optimize for search -- at the expense of most other operations.)


Regards,
Robert

P.S. My selfish desire is for dbmail to not compel a specific DBencoding, because I'm combining dbmail and other data in the same DB --I've patched dbmail to segregate its tables into its own schema.


Thomas Mueller wrote:

Hi Robert,
Here's the patch..
I don't think that's a solution but a (expensive) workaround. Theproblem is: Postgres compresses text but stores bytea as it is.
In future it would be interesting to search on message bodies as well.

What about converting every string to UTF-8 and use a UTF-8 database?
Paul is it hard to convert a incoming message to UTF-8 and use UTF-8everywhere?
Thomas

Re: [Dbmail-dev] Re: PostgreSQL + non-ASCII encoding

Reply via email to