> >>>> One useful factor of ham is that it's not time-sensitive; a mail that
> >>>> was ham in 2003 would still be ham today.  So we can collect old ham
> >>>> mail archives, or submissions of relatively old mail, if necessary.
> >>>
> >>> This may be a false assumption.  A spamvertised or spam sending
> >>> domain from 2003 could have expired and been re-registered by
> >>> a different organization.  Same for ham.  Both ham and spam
> >>> should have expiration times.  1 year would probably be good,
> >>> since spamvertised domains probably don't get renewed.
> >>
> >> yep, I was talking with a SURBLer about this last week I think.  we
> >> should probably add meta conditions ot the URIBL ruleset to ensure
> >> they don't fire at all on old messages.
>
> if we had enough ham to get useful results with that limit, sure.  As
> it is, I'm not sure that's the case.

Btw, I just came across this article (from CEAS 2009):

Jose-Marcio Martins da Cruz, Gordon V. Cormack:
  Using old Spam and Ham Samples to Train Email Filters

http://www.j-chkmail.org/ceas/ceas09-gvcjm.pdf


  Mark

Reply via email to