https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6444

--- Comment #12 from Bradley Kieser <[email protected]> 2010-06-10 17:03:51 EDT 
---
(In reply to comment #7)
> Created an attachment (id=4773)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4773) [details]
> optimized patch
> 
> I did some benchmarking on a smallish database (started it few days ago,
> thanks to enhancement in Bug 6447 it only accumulated 5000 tokens so far),
> and the Bradley's patch doesn't fare too well. Turns out it assembles the
> SQL clause for every token, and it unnecessarily updates newest_token_age
> once for each token. Also the sort is probably redundant.
> 
> I factored out the invariant operations from Bradley's proposal, which
> resulted in the attached patch - and it became about 10 times faster
> for a message that needed to update 150 tokens. The patch also adds
> tok_get_all and tok_touch_all timing measurements to the timing report.
> 
> Probably because of the small set of tokens in my database, the original
> code did even a little bit better than my patched code, although I believe
> that the difference can turn the other way around as reported for a large
> database.
> 
> Here are times in milliseconds for a tok_touch_all() which needed
> to update 150 tokens each time (several runs):
> 
> original tok_touch_all:
>   tok_touch_all: 33
>   tok_touch_all: 16
>   tok_touch_all:  7
>   tok_touch_all:  7
>   tok_touch_all: 21
>   tok_touch_all: 19
>   tok_touch_all: 12
>   tok_touch_all:  6
>   tok_touch_all: 12
>   tok_touch_all:  6
>   tok_touch_all: 29
> 
> new(Mark) tok_touch_all:
>   tok_touch_all: 35
>   tok_touch_all: 40
>   tok_touch_all: 68
>   tok_touch_all: 42
>   tok_touch_all: 33
>   tok_touch_all: 39
>   tok_touch_all: 48
>   tok_touch_all: 33
> 
> new(Bradley) tok_touch_all:
>   tok_touch_all: 413
>   tok_touch_all: 330
>   tok_touch_all: 253
>   tok_touch_all: 525
>   tok_touch_all: 579
>   tok_touch_all: 248
>   tok_touch_all: 329
>   tok_touch_all: 753


Please see my comment below.

I forgot, when I submitted the patch, to include the crucial index that is
needed as well:

    create index bayes_token_idx1 on bayes_token(token);

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to