Theo Van Dinter writes: > On Wed, Sep 06, 2006 at 07:07:42PM +0100, Justin Mason wrote: > > the problem is that it needs to read the rules from rulesrc/sandbox/* -- > > and those rules are pretty dependent in places on the rules in > > rulesrc/core. Those rules, in turn, are the 3.2.0 core ruleset, which > > doesn't mix well with (ie stomps all over) the 3.1.x core ruleset. > > > > We could come up with a way to use the 3.2.0 core ruleset in place of > > the 3.1.x one -- but I think the effort required would be too much, esp. > > since it's easier to just concentrate on the 3.2.0 release instead. > > I'm not sure I agree with this, and we *need* to solve this problem > going forward, or else we won't be able to do 3.2 updates when we're > working on 3.3. > > The rules are pretty version agnostic, except for the ones which have a > dependency on a plugin or other code change that 3.1 doesn't have. I think > it'd be pretty easy to do a run with the 3.2 code and run with the 3.1 code > and figure out which those are. > > Rules that don't work the same get an "if version" wrapper, the rest can stay > the way they are. We can also look at backporting the differences as > appropriate.
OK -- agreed. However, my point is that we'd be better off doing that work as part of the 3.2.0 development, and later for 3.3.0 -- rather than trying to "retrofit" it into 3.1.6 or 3.1.7, I think. There's no *need* to keep 3.1.x going, if we can start getting 3.2.0 released instead. > As for rulesrc, mkrules, etc -- 3.1 doesn't need any of that. This is also my > main issue with how 3.2 currently does stuff. I don't understand why this > stuff is part of the normal distro. I like to think of the distro as the > engine side of the project, and mkrules/rulesrc as the rules side of the > project, and there's no reason they have to be together. > > So for 3.1, we generate, externally, the rules directory and include it in the > directory that gets mass-check'ed. For 3.2, same thing. Then in the normal > SA distribution, we don't need the whole svn:external/rulesrc/mkrules/etc > stuff, it'll just be a rules dir like before. Hmm -- I think I'd need more details of how that'd work -- I'm not sure I get it. One thing I'd want to avoid is having to set up two separate SVN workspaces to get a usable checkout, or having to download two separate tarballs to get a usable release. In my opinion, the core code is nearly useless without rules, so there isn't a need to ship it without them. > > last week featuring data from 7 contributors. > > Hrm. It still seems like a small number of messages/diversity: > > 6 0.0 ham-bb-doc.log > 14998 18.9 ham-bb-jm.log > 6 0.0 ham-bb-zmi.log > 6357 8.0 ham-cthielen.log > 1510 1.9 ham-daf.log > 167 0.2 ham-dos.log > 1958 2.5 ham-parkerm.log > 46895 59.0 ham-theo.log > 2028 2.5 ham-wtogami.log > 5619 7.1 ham-zmi.log > > 15006 3.9 spam-bb-doc.log > 15000 3.9 spam-bb-jm.log > 8358 2.2 spam-bb-zmi.log > 13783 3.5 spam-cthielen.log > 6261 1.6 spam-daf.log > 4676 1.2 spam-dos.log > 61619 15.9 spam-parkerm.log > 253448 65.2 spam-theo.log > 2156 0.6 spam-wtogami.log > 8359 2.2 spam-zmi.log > > (that's 468210 total, btw) and why does zmi have two sets of files? It looks like his spam collections are a dup, alright. for what it's worth, the "bb-*" mass-checks are limited to 15k messages of each type, max, since mass-checking old spam is pointless. I should set it up to allow more ham, however, since old ham is fine. I was also thinking we should set up some trusted spamtraps to collect lots of spam with "live" network test data -- I think most of our spam corpora we're mass-checking nowadays is incomplete. for example, my corpora will omit everything that hit SBL+XBL, and Michael's is similarly omitting lots of those too. Nowadays spamtrapping may be the only viable way to get a really representative spam corpus.... --j.
