http://bugzilla.spamassassin.org/show_bug.cgi?id=3225
------- Additional Comments From [EMAIL PROTECTED] 2004-11-01 16:20 ------- >From Michael Parker 2004-04-12 15:00 > 1) Implements Sidney's tok_get_all method for SQL and DBM. Right now the SQL > version will get the tokens from the DB in chunks (100, 50, 25, 5, 1) which > needs to be benchmarked and tweaked based on what works the best. For MySQL anyway, is there a reason for attempting to cache bayes token queries? Every time the atime of a token is updated the cache for the entire bayes_token table is cleared, so queries are very rarely actually served from cache. Since these token queries don't benefit from the SQL server cache, there's no point in caching them (they'll be cleared anyway) and no need to worry about blowing away the cache (the reason behind bunches I believe). I've recently timed token queries for about 4600 messages as they are received by my mail server. Replacing the fixed bunch sizes with a while loop that queries up to 100 tokens at a time (I didn't want to exceed any maximum query lengths) has significantly decreased the amount of time token query takes for an average message (average of 193 tokens), by about 57%. Using current bunches: 2295 messages 194 tokens per message average 1.227 seconds per message 0.00630 seconds per token Using loop, up to 100 tokens at a time: 2302 messages 192 tokens per message average 0.511 seconds per message 0.00266 seconds per token Using the loop the cache is also cleared every atime update, like above with bunches. SQL_NO_CACHE could be inserted into the statement to avoid the overhead of the unused cache insertion. I'd imagine other SQL servers would behave similarly, but I'm not familiar with how other servers (Oracle, Postgres, etc) handle caching, specifically what causes a tables cache to be cleared. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.