Thomas,
Is the expense concern the computational complexity (escaping for SQL),
the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondary
storage use (not exploiting PostgreSQL's text compression)? Perhaps the
first two of these could be addressed by using PostgreSQL's
parameterized query execution functions. As for secondary storage use,
my understanding is that both "text" and "bytea" can be stored
compressed and uncompressed (at the operator's discretion).
The patch I sent before was indeed sub-optimal w.r.t. performance, but
fortunately was optimal w.r.t. my level of effort, portable among dbmail
revisions, and adequate w.r.t. my performance requirements (I am my DB's
sole user ;).
Regarding message body searching: it seems to me that searching a "text"
column has the same complexities as searching "bytea". In both cases,
it would be necessary to transcode the search string (or the message
block) to matching encodings. (As you mentioned, switching entirely
UTF-8 would optimize for search -- at the expense of most other operations.)
Regards,
Robert
P.S. My selfish desire is for dbmail to not compel a specific DB
encoding, because I'm combining dbmail and other data in the same DB --
I've patched dbmail to segregate its tables into its own schema.
Thomas Mueller wrote:
Hi Robert,
Here's the patch..
I don't think that's a solution but a (expensive) workaround. The
problem is: Postgres compresses text but stores bytea as it is.
In future it would be interesting to search on message bodies as well.
What about converting every string to UTF-8 and use a UTF-8 database?
Paul is it hard to convert a incoming message to UTF-8 and use UTF-8
everywhere?
Thomas