On Fri, 2010-08-20 at 17:47 +0200, Karsten Bräckelmann wrote:
> On Fri, 2010-08-20 at 17:12 +0200, Jan P. Kessler wrote:
> > false-positives hitting on the rules JM_SOUGHT_1 and JM_SOUGHT_2.
> > Unfortunaley I can not give examples as these messages contain
> > confidental customer data (assurance company). We had more than 100
> > false-positives with these rules in the last 2 days.
> 
> I hope you can tell us the __SEEK_* sub-rules triggered, though. That

Jan,  any chance you could provide the paragraphs or text parts
corresponding to the seeks?

Just to clarify: We do *not* require the full message, even though it
makes things simpler. In fact, no headers (other than Subject) are ever
used in the sought process.

Anonymizing any personal data is perfectly fine. Moreover, the ham
corpus for sought is not available publicly, but restricted to a few SA
developers only.

The rendered and normalized body text is used to prevent seeks from
appearing in the automatically generated rules -- strings directly
extracted from spam. Thus, by its nature, the FP string itself cannot
possibly be confidential. :)


Please feel free to send FPs to me off-list. However, please do protect
them inside an archive, or send a link where I can pick them up. I'll
take care about adding them to the sought ham corpus.


> would help already. To extract these, either  (a) pipe such a message to
> spamassassin -D, and get the sub-rule from the debug output, or  (b) add
> a specific header only showing the sub-rules.
> 
>   spamassassin --cf="add_header all Subtests _SUBTESTS(,)_"
> 
> Odds are, the FPs are some sort of stupid disclaimer that sneaked into
> the spam corpus.
> 
> Once we know which sub-rule causes the FPs, and preferably get the full,
> original string, we can add the sample to the ham corpus, preventing the
> automated sought process from picking it up.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to