Hi Robert,

> Is the expense concern the computational complexity (escaping for SQL),
> the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondary
> storage use (not exploiting PostgreSQL's text compression)?  Perhaps the
> first two of these could be addressed by using PostgreSQL's
> parameterized query execution functions.  As for secondary storage use,
> my understanding is that both "text" and "bytea" can be stored
> compressed and uncompressed (at the operator's discretion).

I only meant the storage requirements.

Datatype 'text'
http://www.postgresql.org/docs/8.0/interactive/datatype-character.html:
"Long strings are compressed by the system automatically, so the
physical requirement on disk may be less."

Datatype 'binary'
http://www.postgresql.org/docs/8.0/interactive/datatype-binary.html says
nothing about compression. Probably because the size reduction/cpu cost
is much worse than for text fields.

Unfortunately Moores Law doesn't apply to disc io, there is only very
little improvement every year, so io becomes more and more a bottleneck.
That's why it's a good idea to increase cpu usage to lower io (compression).

> The patch I sent before was indeed sub-optimal w.r.t. performance, but
> fortunately was optimal w.r.t. my level of effort, portable among dbmail
> revisions, and adequate w.r.t. my performance requirements (I am my DB's
> sole user ;).

The bigger database and increased io is no problem for a single user of
course, but I have a database that is >> 200 GB. I'm quite sure there
would be a performance loss if I would switch to bytea instead of text.

> Regarding message body searching: it seems to me that searching a "text"
> column has the same complexities as searching "bytea".  In both cases,
> it would be necessary to transcode the search string (or the message
> block) to matching encodings.  (As you mentioned, switching entirely
> UTF-8 would optimize for search -- at the expense of most other
> operations.)

I thought about fulltext search engines like tsearch2, they only work on
text fields.

A normal SELECT on a mailbox would be a real bad idea, because the
database would have to read all message bodies from disc. My mailbox has
10 GB, that's lot of io :)


Thomas

Reply via email to