> Please let me know what you think!
Daryl and Chris both make a number of good points, but the buildbot idea
also seems to have a good deal of merit. A creative solution for the
'private corpus' problem that Chris mentions might help a lot though.
Unfortunately I don't have one at the moment, but hopefully someone else
will.
One thing I noticed is a trivial potential procedural problem. You mention
If it passes, it's checked in as "rulesrc/sandbox/mailed/latest.cf",
triggering
the preflight buildbot to start its mass-checks.
But what if several files are submitted at about the same time? Do we need
"latest1.cf", "latest2.cf", etc? Certainly you don't want a submitted
request getting lost because another came along just after it.
I think also solving the "recent spam" problem should be viewed as a
critical goal. For the last few weeks I've been off and on submitting new
Leo rules as fast as he mutates his drug spams. Generally the hit zero in
all of the SARE corpori, even though I have dozens of spam that day that
hit. Or I have stock scam rules that hit virtually nothing in the corpus,
but have been hitting going back weeks in my own non-official spam. Clearly
a way to get a fairly wide selection of fresh spam is a good thing, even if
it (obviously) means purging the spam after 30 days or so (although I'd like
to see it go back 60 if possible to catch reoccuring stuff.)
A related point on history. Yesterday or so Justin replied to a test of
Bob's from a month ago or so on the historical hits on a given rule. It
might be nice ("nice", not "required") if something like this could be built
into the rules project.
Loren