On Donnerstag 26 Februar 2009 Paul J Stevens wrote:
> Michael Monnerie wrote:
> > On Dienstag 24 Februar 2009 Michael Monnerie wrote:
> >> As we can drop dbmail_headervalue_3 index anyway, drop that 255
> >> char field also, and store only the full headervalue. Use that
> >> nice compressing technique Niki implemented already, but without
> >> hash. That might be more overhead than searching the whole table.
> >> If it needs be used, use a hash as short as possible to save
> >> storage. A cheap md5 hash should be enough, maybe less is
> >> possible.
>
> I propose we drop the index, but keep the hash as a varchar.

If it's for the full length line, it could be good for searching double 
values on INSERT time. But only for that, or is there any other use?

The question is: Is it worth the effort? If the hash is reasonably 
short, I guess yes. It should be limited to 16 bytes (to save disk 
space) and allow duplicates, because it doesn't matter to have a hash 
crash here when you compare the full text afterwards. You can easily 
SELECT ... WHERE hashfield='computed_hash' AND headervalue='new_line'
and the db can use the index over hashfield to find only the 1-2 hashes 
that fit and finally compare contents using the full line.

BTW: Do you allow hash crashes in the single instance store of the 
messageparts? I guess yes.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Reply via email to