> I would do that differently. I would query the default (uid 0)
You are right ! I've learnd that uid 0 is the default very recently
and forgot to take it into account
> According to dspam.conf:[...]
so, I've taken your updated script, and crafted it to follow as close as
possible what dspam_clean does (at least as said in the man page).
So my modifs are :
- add the same sql variables as dspam_clean manages
- load uid 0 training pref (in a single query)
I've also added a query to delete old token whose probability is between 0.35
and 0.65.
For the transaction parts, I've always though that a single SQL query is
always atomic, so no need for a transaction for just one query.
Am I wrong ?
>> -- Cleanup dictionnaries of passive users
> Such a query should run as one of the first queries. But why do you punish
> users not having reclassified anything?
Then that may be too specific to my setup ... :)
So, how about this new proposition ?
Nicolas
--
-- Set some defaults
--
SET @TrainingMode = 'TEFT'; -- Default training mode
SET @PurgeSignatures = 14; -- Stale signatures
SET @PurgeNeutral = 90; -- Tokens with neutralish probabilities
SET @PurgeUnused = 90; -- Unused tokens
SET @PurgeHapaxes = 30; -- Tokens with less than 5 hits (hapaxes)
SET @PurgeHits1S = 15; -- Tokens with only 1 spam hit
SET @PurgeHits1I = 15; -- Tokens with only 1 innocent hit
SET @today = to_days(current_date());
--
-- Delete tokens with less than 5 hits (hapaxes)
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE from_days(@tod...@purgehapaxes) > last_hit
AND (2*innocent_hits)+spam_hits < 5;
COMMIT;
--
-- Delete tokens with only 1 spam hit
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE from_days(@tod...@purgehits1s) > last_hit
AND innocent_hits = 0 AND spam_hits = 1;
COMMIT;
--
-- Delete tokens with only 1 innocent hit
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE from_days(@tod...@purgehits1i) > last_hit
AND innocent_hits = 1 AND spam_hits = 0;
COMMIT;
--
-- Delete tokens with neutralish probabilities
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE from_days(@tod...@purgeneutral) > last_hit
AND spam_hits/(innocent_hits+spam_hits) BETWEEN 0.35 AND 0.65
COMMIT;
--
-- Delete unused tokens, except for TOE, TUM and NOTRAIN modes
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM t USING dspam_token_data t
LEFT JOIN dspam_preferences p ON p.preference = 'trainingMode' AND p.uid =
t.uid
LEFT JOIN dspam_preferences d ON d.preference = 'trainingMode' AND d.uid = 0
WHERE from_days(@tod...@purgeunused) > last_hit
AND COALESCE(p.value,d.value,@TrainingMode) NOT IN ('TOE','TUM','NOTRAIN');
COMMIT;
--
-- Delete TUM tokens seen no more than 50 times
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM t USING dspam_token_data t
LEFT JOIN dspam_preferences p ON p.preference = 'trainingMode' AND p.uid =
t.uid
LEFT JOIN dspam_preferences d ON d.preference = 'trainingMode' AND d.uid = 0
WHERE from_days(@tod...@purgeunused) > last_hit
AND COALESCE(p.value,d.value,@TrainingMode) = 'TUM'
AND innocent_hits + spam_hits < 50;
COMMIT;
--
-- Delete stale signatures
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
FROM dspam_signature_data
WHERE from_days(@tod...@purgesignatures) > created_on;
COMMIT;
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Dspam-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-devel