Re: Spam package redesign

Janne Jalkanen Sun, 27 Sep 2009 13:12:58 -0700


Heya!

I don't quite understand - why would a floating point value be anymore complicated than an integer?

Note that at the moment we mark a modification a spam if a single testfails, which essentially means doing logical OR more than any sort ofan addition (=threshold is always 1). The integer is there more orless as something-which-was-never-completed-and-I-am-almost-sure-is-the-wrong-design. In fact, I did lament the fact that you can't saythat "X is a bit more efficient than Y" - with an integer-based designyou essentially go for fixed point arithmetics (test A gives you 100,test B gives you 150, etc, where you just shifted decimal points tothe right just for the heck of it).


If you take a look at SpamAssassin, it uses floats.

I'd hesitate changing it later, since it will create an API contract.

/Janne

On Sep 27, 2009, at 22:34 , Andrew Jaquith wrote:

Janne --- good critique, and good suggestions. I've killed the limit-setting methods on the Inspection class itself in favor of alistener strategy. I adopteed your other suggestions too, except...
...at the moment I am inclined to leave the scoring scale as integer-based for now. I like it because it is simple, and resembles what wedo now. We can change it later if need be.
In answer to your first question: the inspector package does nothelp graduation, although it does help the beta that comes after thediploma. :)
Andrew
On Sep 27, 2009, at 4:37, Janne Jalkanen <[email protected]>wrote:
My main concern: does this advance our graduation?

Others inline.

On 25 Sep 2009, at 20:54, Andrew Jaquith wrote:
The "inspect" package provides facilities that allow content to be
inspected and scored based on various criteria such as whether awiki
page change contains spam. Content may be scored by initializing an
Inspection object and then calling the {...@link Inspection#inspect()}
method. The {...@code inspect} method, in turn, iterates through a
variety of previously-configured {...@link Inspector} objects and calls
the {...@link Inspector#inspect(String,String)} method for each one.Each
of the configured inspectors is free to perform whatever evaluations
it wishes, and can increase, decrease or reset the "scores" for any
score category, for example the {...@link Score#SPAM} category.
Ok.
* Score objects supply instructions to the parent Inspectionobject to
increment, decrement or reset the score for a particular category.
Each Score object is constructed with a category (for example,
Score.SPAM), an integer indicating how much to change the score, and
an optional String message that provides context for the change. For
example, a Score that increments the spam score by 1 could be
constructed by new Score( Score.SPAM, 1, "Bot detected." ). Negative
numbers can be supplied also to decrease the score. For convenience,
{...@link Score#INCREMENT_SCORE} means "add 1", {...@link
Score#DECREMENT_SCORE} means "subtract 1", and {...@link Score#RESET}
means "reset to zero."
Smells of overdesign to me. Creating a new operation andmethodology to support addition sounds like a bad idea. Not tomention the fact that you might actually want to use fractions.
Something like inspection.changeScore( Score.SPAM, 0.5f ) soundsmore appealing to me.
public String preSave( WikiContext context, String content ) throws
RedirectException
{
Change change = getChange( context, content );

// Run the Inspection
Inspection inspection = new Inspection( context, m_config,m_inspectors );
inspection.addLimit( Score.SPAM, m_scoreLimit );
inspection.inspect( content, change );
int spamScore = inspection.getScore( Score.SPAM );
context.setVariable( ATTR_SPAMFILTER_SCORE, spamScore );

// Redirect user if score too high
if( inspection.isFailed() )
{
 ...
}
...
}
Yes, this looks unnecessarily complicated and limiting to me. Iwould remove automatic limit-rating altogether, and would justconcentrate on refactoring the individual methods from SpamFilterinto the inspect-package with a light wrapper around them. I thinka callback design is better than adding quite limited limitthingies anyway, as this gives full programmatic control to thedeveloper without us having to pre-guess the possible limitations.
E.g. inspection.inspect( content, change, newInspectionResultListener() {
  public boolean visit(Inspection ins)
  {
       // Any possible decision code goes here
       return ins.getScore() > m_maxScore;
  }
});
Obviously, these listeners would be optional, and you could justlet the entire chain run without interference. And for simplicity,we can provide some default listeners, e.g.
inspection.inspect( content, change, newStopAtScoreListener( Score.SPAM, 5 ) );
(Yeah, I'm a big fan of this pattern 'cos it gives much morecontrol to the developer and eases debugging immensely.)
ChangeRateInspector, LinkCountInspector, SpamInspector (forAkismet),
This should obviously be called AkismetInspector; all the othersdetect spammers too (and Akismet is probably the worst of the lotanyway).
/Janne

Re: Spam package redesign

Reply via email to