On Fri, 2010-10-15 at 17:29 -0300, Marco Ribeiro wrote: > Does anyone know of a good up-to-date Ham Corpus? I'm using > SpamArchive for spam, but I haven't found a good one for Ham. > I'm not sure if I understood this [1] correctly, but does that mean I > must upload my own corpus if I want to perform a mass check on the > server? If not, is it possible to download others' corpora?
Uhm, what is it you want to do? The UploadedCorpora [1] wiki page is for Rule QA -- the masscheck refers to checking SA rules against a corpus of ham and spam. Hand classified ham and spam that is, to evaluate performance and accuracy of SA rules and re-scoring. I guess what irritates me most is, that it's unclear why you want to perform a "mass check on the server" in the context of asking for other folks ham corpus. And no, you cannot download other contributors ham corpus. See the section Privacy in your reference. [1] http://wiki.apache.org/spamassassin/UploadedCorpora -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
