Michael Monnerie wrote:
> Aren't most searches of the form "%searchstring%"? If yes, 

All text-searches (against headervalues or messageblks/mimeparts) are
like that. Searches against XXXfield tables are exact, though possibly
case-insensitive.

> the index is useless, as you can only search "searchstring%" to use the 
> index. And if the index can't be used, it could be dropped altogether. I 
> did some searches with Kmail, the string I searched for shows what I 
> told Kmail to search (ZMI_*, e.g. ZMI_SUBJECT searched for subject)

All headers searches are performed against the headervalue table. The
subjectfield table is for thread=orderedsubject only. The other xxxfield
tables are used for sorting. So I guess you are correct: the index on
dbmail_headervalues.headervalue might as well be dropped.

> [snip]
> 2) Every single search was of the form "%searchtext%", so no index could 
> be used. I mean, in dbmail 2.2 the index
> »dbmail_headervalue_3« btree ("substring"(headervalue, 0, 255))
> was never actually used and is therefore useless. And I guess 99% of all 
> searches will be performed with "where X contains Y", and that's why it 
> will have to be searched as X ILIKE %Y%. We could drop the index and the 
> speed wil be increased, as a never used index only costs performance.

agreed. But the indexes on the xxxfield tables should stay.

> Or is there any client that performs a search like Y%? I guess no. At 
> least, when I search for messages I always search "where X contains Y" 
> and not "where X begins with Y".

That type of search is not possible with IMAP-SEARCH. 'Contains' is the
operative here.

> 3) Searches are performed only on substring(headervalue,0,255). The 
> whole "headervalue" is never actually searched, and nobody seems to have 
> found that. This means either
> a) people don't search in such long headervalues
> or
> b) people were searching but not finding messages and wondering, still 
> they didn't report because they can't believe the computer is wrong.

Both are equally likely. Generally, imap clients tend to prefer
client-side searching.

> Is it true that we only search the first 255 chars anyway? Then get rid 
> of the longer headervalues.
> If we want to search >255 chars in headers: is the index useful at all? 

For myself, I wouldn't mind chopping the headervalue field to 255 chars.
But for the sake of correctness we should keep it as a text/blob.

> If %search% is used all the time, no index can be used and therefore be 
> dropped. Starting from PostgreSQL 8.3, a full text index could be used
> http://www.postgresql.org/docs/8.3/static/textsearch.html

I'm not sure that would be compliant with imap's simple substring matching.

> But I guess we don't want database specific searches?

We already have a framework in place to do database specific queries. So
I don't see a fundamental problem there.

-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Reply via email to