Mikhail Ramendik <[EMAIL PROTECTED]> said:

> Aaron Stone wrote:
>> mysql> SELECT DISTINCT(physmessage_id) FROM dbmail_messageblks WHERE
>> messageblk LIKE '%From:%Aaron%';
>> 
>> 11089 rows in set (31.17 sec) [cpu hovered around 50%]
> 
> Could you pelase also benchmark
> 
> SELECT DISTINCT(physmessage_id) FROM dbmail_messageblks WHERE
> messageblk LIKE '%Aaron%';

19066 rows in set (12.81 sec)

So it takes about 35% of the time, but returned 190% of the rows. We'll
have to combine this time trial with the header parser to see which adds
more time per message. My guess is the header parser, but we'll just have
to see.

> I mean, if LIKE is multiline and so we're actually searching for all 
> messageblks which have "Aaron" *anywhere* after "From:", why bother with 
> "From:" ?

This might be important if you're searching for uncommon headers. Of
course, uncommon headers probably also mean that they're uncommonly
searched for...

> Of course, until we have is_header, this can sort out some of the false 
> positives of the non-header type... Some will remain, so we must take 
> care not to parse them, probably by comparing their messageblk_id's to a 
> buffered header messageblk_id list for the mailbox.

If we group by physmessage_id and sort ascending by messageblk_idnr, then
we'll just look at the first block for each given physmessage_id. This may
break in 2.1 if we eliminate auto_increment columns, though, but we can
rethink it when we get there.

Aaron

--

Reply via email to