Re: [Dbmail-dev] Re: [DBMail 0000754]: single instance storage for headervalues

Michael Monnerie Fri, 06 Feb 2009 02:40:41 -0800

On Donnerstag 05 Februar 2009 Paul J Stevens wrote:
> >  id          | bigint         | not null default
> > nextval('dbmail_headervalue_id_seq'::regclass)
> >  headervalue | character varying (255) | default null
> >  headervalue_full | text           | default null
> > Indexe:
> >     »dbmail_headervalue_pkey« PRIMARY KEY, btree (id)
> >     »dbmail_headervalue_1« btree (headervalue)
> >
> > And then fill the real headervalue into "headervalue" if it's <=255
> > chars, and only to headervalue_full if it's >255 chars long.
>
> Without the hash, how are you going to locate matching
> headervalue_full rows without doing full table scans using HAVING?


I see. What about this: For headervalues that are really > 255 chars, 
store the first 255 chars into headervalue. So at least when using 
HAVING you can reduce it to entries "WHERE 
headervalue=substring(the_new_long_headervalue,0,255)". That should 
leave very few headers that fit.

> > *) save 256 hash bytes per headervalue entry (for me with 2.8 mio.
> > entries thats about 700MB on disk).
>
> The hash doesn't have to be 256 wide. Most algorithms will generate
> much smaller strings.

So it should be defined varchar instead character to save the space. 
Isn't character always stored in the defined length, while varchar saves 
all possible space? At least for PostgreSQL:
http://www.postgresql.org/docs/8.3/interactive/datatype-character.html

> Also, the number of headervalue rows will be less than you have now.
> Just a detail.

Yes but I don't know how much is saved. And my other system could have 
much more because there are a lot more users there.

> Yes, but I'd like to avoid full scans on insertions.

Right. That wouldn't be good for performance. What about increasing the 
headervalue length to the max. possible. On PostgreSQL 8.3:
create table x ( y varchar(1000000000) );
ERROR:  length for type varchar cannot exceed 10485760
So a length of 10MB is possible for a simple varchar:
create table x ( y varchar(10485760) );
CREATE TABLE
That would for sure be enough for a single header value. What do the 
other databases have? If it's long enough to store a headervalue, simply 
use varchar and all hassles are gone.

> > Sorry that I have this idea only now, but it could really save lots
> > of disk space and CPU time, and improve speed even further. I don't
> > know the amount of work that would mean for you, but I can imagine
> > it's relatively straightforward.
>
> I'd rather see headervalue_full be stored into a separate table. That
> way you can clean that table out every now and then. Or maybe even
> make it completely optional.

Could be OK, but please let's look if the 10MB strlen of PostgreSQL are 
also possible in the other supported DBs. Or at least, if the max 
possible would be long enough for a headervalue. What's the max here 
anyway? 4KB? 64KB? Couldn't we just cut the headervalue at 4KB? After 
all, this is only used for searching, right?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Re: [Dbmail-dev] Re: [DBMail 0000754]: single instance storage for headervalues

Reply via email to