Hello.

While looking to do low-level disk usage optimization, some simple performance 
tests relied on full-text searches (2.4 branch). Metadata always resides on 
local disks, while messages are on slower hardware.

I noticed that full-text searches with short strings take much longer than 
longer text. For example, a FT search on 3 letters takes >60" while a 9-letter 
long string on the same corpus lasts ~20". These tests have been repeated over 
and over again to exclude disk caching being the culprit: reversing the search 
order - longer first - has no impact.

So I opened up the cyrus source code and looked for search-related code. As I 
understand it, squatter is not used if the search string is shorter than 4 
symbols. From squat.h it's quite clear:

/*
Don't change this unless you're SURE you know what you're doing.
Its only effect on the API is that searches for strings that are
shorter than SQUAT_WORD_SIZE are not allowed.
In SQUAT, a 'word' simply refers to a string of SQUAT_WORD_SIZE
arbitrary bytes.
*/

#define SQUAT_WORD_SIZE 4

So, question to who knows the squatter implementation in cyrus: is this lower 
limit applied to all searches? Body, subject, addresse(s)?

And, does this lower bound still apply to 3.0 branch and the new indexing 
engine Xapian?

Let alone low level disk compression or optimization, a client might not handle 
well long search times without receiving data on the IMAP channel and dismiss 
the connection (or a network device could do it). So, if searching for short 
strings means reading all raw message files, I should warn users through the 
client interface of possible failures since the mail corpus keeps growing and 
growing and growing. That's until we upgrade to 3.0, it that helps.

Thanks,

Paolo
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to