On Freitag 13 Februar 2009 Paul J Stevens wrote:
> Michael Monnerie wrote:
> > Aren't most searches of the form "%searchstring%"? If yes,
>
> All text-searches (against headervalues or messageblks/mimeparts) are
> like that. Searches against XXXfield tables are exact, though
> possibly case-insensitive.
>
> > the index is useless, as you can only search "searchstring%" to use
> > the index. And if the index can't be used, it could be dropped
> > altogether. I did some searches with Kmail, the string I searched
> > for shows what I told Kmail to search (ZMI_*, e.g. ZMI_SUBJECT
> > searched for subject)
>
> All headers searches are performed against the headervalue table. The
> subjectfield table is for thread=orderedsubject only. The other
> xxxfield tables are used for sorting. So I guess you are correct: the
> index on dbmail_headervalues.headervalue might as well be dropped.
OK, I wrote that on the performance Wiki FAQ. Maybe those 2 indices
could be dropped for 2.2.12? It's a small fix, no side effects except
saving disk space and accesses.
> > [snip]
> > 2) Every single search was of the form "%searchtext%", so no index
> > could be used. I mean, in dbmail 2.2 the index
> > »dbmail_headervalue_3« btree ("substring"(headervalue, 0, 255))
> > was never actually used and is therefore useless. And I guess 99%
> > of all searches will be performed with "where X contains Y", and
> > that's why it will have to be searched as X ILIKE %Y%. We could
> > drop the index and the speed wil be increased, as a never used
> > index only costs performance.
>
> agreed. But the indexes on the xxxfield tables should stay.
Yes, I didn't look at those other headers at all anyway.
> > 3) Searches are performed only on substring(headervalue,0,255). The
> > whole "headervalue" is never actually searched, and nobody seems to
> > have found that. This means either
> > a) people don't search in such long headervalues
> > or
> > b) people were searching but not finding messages and wondering,
> > still they didn't report because they can't believe the computer is
> > wrong.
>
> Both are equally likely. Generally, imap clients tend to prefer
> client-side searching.
OK. So if they search "everything with headers", you can grep through
messageblks anyway, it's all there. Same if "only headers" are searched.
Only if a specific header is searched, the headervalue table is helpful.
> > Is it true that we only search the first 255 chars anyway? Then get
> > rid of the longer headervalues.
> > If we want to search >255 chars in headers: is the index useful at
> > all?
>
> For myself, I wouldn't mind chopping the headervalue field to 255
> chars. But for the sake of correctness we should keep it as a
> text/blob.
As we can drop dbmail_headervalue_3 index anyway, drop that 255 char
field also, and store only the full headervalue. Use that nice
compressing technique Niki implemented already, but without hash. That
might be more overhead than searching the whole table. If it needs be
used, use a hash as short as possible to save storage. A cheap md5 hash
should be enough, maybe less is possible.
mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev