Re: Masscheck reuse

2018-10-04 Thread Kevin A. McGrail
Might want to put this on the wiki too!  Adding SASA group too for their
input.
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Thu, Oct 4, 2018 at 10:28 AM Henrik K  wrote:

>
> Still hoping to get some conversation going on about reuse.
>
> Personally I create my corpus like this:
>
> - hacked amavisd-milter to save unmodified message copy to "pristine"
>   directory
>
> - run a separate clean install of trunk SA/spamd that has default rules,
>   razor/pyzor/dcc etc, and only runs all "reuse" flagged rules
>   (my recent trunk commit)
> --pre "loadplugin Mail::SpamAssassin::Plugin::Reuse"
> --pre "run_reuse_tests_only 1"
>
> - cron every minute: run messages from "pristine" directory through
>   above spamd to add X-Spam-Status header and move to "corpus"
>
> - a bit later get mailids and resulting ham/spam status from my main
> amavis,
>   and sort out "corpus" to "corpus_ham/spam" (of course with some manual
>   vetting, dspam crosscheck etc)
>
> Since my main setup uses extreme whitelisting and shortcircuiting, this is
> the only way to get 100% legit corpus.  It takes very little resources
> anyway, since that spamd just runs network lookups (which are mostly cached
> already).
>
> Basically I'd like to see masscheckers do something similar.  Doesn't
> matter
> where you source all the corpus, it is possible to clean them up to
> "pristine status" and run ASAP though spamd setup like above.  That way
> they
> have legit X-Spam-Status header that can be reused even years later.
>
> Of course if your corpus already has X-Spam-Status from mail receive time
> (and all possible plugins and checks are enabled), then it's simply the
> case
> of enabling reuse.  But shortcircuited messages should be skipped.
>
> I also recently added REUSE config here:
>
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/
>
>
>
>
> On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> >
> > If you look at the ancient mass-check code before Reuse.pm was split from
> > it, it shows the original intention:
> >
> >
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> >
> > # --reuse without --net means we need to just zero ALL net rules; skip
> net
> > # lookups entirely except for the reused ones.
> > (then it proceeds to zero scores for all "tflags net" rules)
> >
> > Ok I'm not even sure why it's talking about --reuse withOUT --net, since
> the
> > point here is to do separate scoresets with and without network checks?
> One
> > would simply run local checks only, or --reuse --net.
> >
> > If everyone used reuse, would there even be need for "weekly" masschecks
> as
> > every day simply included the network checks!?  If you ask me, without
> > --reuse one would be only allowed to submit "nightly" masschecks (no
> --net).
> >
> > Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes
> scores
> > for those.  So it is important to remember to use "reuse XXX" for any net
> > rules, since it doesn't automatically iterate through them anymore!
> Which
> > in my mind is silly, why not simply iterate again through "tflags net"
> and
> > forget "reuse" stanza completely.
> >
> > Cheers,
> > Henrik
> >
> >
> >
> >
> > On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > >
> > > Hey guys,
> > >
> > > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > >
> > > I just committed fixes for lots of missing reuse flags, and now I can
> > > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > > launching.  So it's super fast too.
> > >
> > > What reason would there be to prefer running without reuse?  Is this
> simply
> > > a case of missing guidance/documentation?  Looking at some corpus logs,
> > > judging by Maildir file timestamps there are even few years old
> messages run
> > > through.  How can that make any sense, I wouldn't run anything older
> than
> > > an hour through DNSBLs.
> > >
> > > Of course I understand if someones messages don't have a scantime
> > > X-Spam-Status header for some reason, but even that could be easily
> fixable
> > > by simply running the messages through a dedicated spamd as soon as
> possible
> > > to add the headers.
> > >
> > > Cheers,
> > > Henrik
>


Cron ~/svn/trunk/build/mkupdates/run_nightly | /usr/bin/tee /var/www/automc.spamassassin.org/mkupdates/mkupdates.txt

2018-10-04 Thread Cron Daemon
+ promote_active_rules
+ pwd
+ /usr/bin/perl build/mkupdates/listpromotable
/usr/local/spamassassin/automc/svn/trunk
HTTP get: http://ruleqa.spamassassin.org/1-days-ago?xml=1
HTTP get: http://ruleqa.spamassassin.org/2-days-ago?xml=1
HTTP get: http://ruleqa.spamassassin.org/3-days-ago?xml=1
+ mv rules/active.list.new rules/active.list
+ svn diff rules
+ cat /var/www/ruleqa.spamassassin.org/reports/LATEST
Index: rules/active.list
===
--- rules/active.list   (revision 1842688)
+++ rules/active.list   (working copy)
@@ -1,5 +1,5 @@
 # active ruleset list, automatically generated from 
http://ruleqa.spamassassin.org/
-# with results from: day 1: darxus ena-week0 ena-week1 ena-week4 giovanni 
grenier hege jarif jbrooks mmiroslaw-mails-ham mmiroslaw-mails-spam sihde 
thendrikx; day 2: ena-week0 ena-week4 jbrooks llanga mmiroslaw-mails-ham 
mmiroslaw-mails-spam sihde; day 3: ena-week0 ena-week4 jarif jbrooks llanga 
mmiroslaw-mails-ham mmiroslaw-mails-spam sihde
+# with results from: day 1: darxus ena-week0 ena-week1 ena-week4 giovanni 
grenier hege jarif jbrooks mmiroslaw-mails-ham mmiroslaw-mails-spam sihde 
thendrikx; day 2: ena-week0 ena-week4 jarif jbrooks llanga mmiroslaw-mails-ham 
mmiroslaw-mails-spam sihde; day 3: ena-week0 ena-week4 jbrooks llanga 
mmiroslaw-mails-ham mmiroslaw-mails-spam sihde
 
 # tflags publish
 AC_BR_BONANZA
@@ -448,6 +448,9 @@
 # good enough
 KB_FORGED_MOZ4
 
+# good enough
+KHOP_DYNAMIC
+
 # tflags publish
 LIST_PRTL_PUMPDUMP
 
@@ -911,9 +914,6 @@
 TO_NO_BRKTS_PCNT
 
 # good enough
-TVD_INCREASE_SIZE
-
-# good enough
 TVD_PH_BODY_META
 
 # good enough
@@ -929,53 +929,62 @@
 TW_GIBBERISH_MANY
 
 # good enough
-ADVANCE_FEE_3_NEW_FRM_MNY
+ADMITS_SPAM
 
 # good enough
-ADVANCE_FEE_5_NEW_FRM_MNY
+ADVANCE_FEE_2_NEW_FRM_MNY
 
 # good enough
-AXB_XMAILER_MIMEOLE_OL_1ECD5
+ADVANCE_FEE_4_NEW_FRM_MNY
 
 # good enough
-BIGNUM_EMAILS
+ADVANCE_FEE_4_NEW_MONEY
 
 # good enough
-DATE_IN_FUTURE_96_Q
+ADVANCE_FEE_5_NEW
 
 # good enough
-DEAR_BENEFICIARY
+BODY_SINGLE_URI
 
 # good enough
-FILL_THIS_FORM_LONG
+DUP_SUSP_HDR
 
 # good enough
-FROM_MISSP_DYNIP
+FILL_THIS_FORM_LOAN
 
 # good enough
-FROM_MISSP_TO_UNDISC
+FROM_MISSPACED
 
 # good enough
-HK_CTE_RAW
+FROM_MISSP_USER
 
 # good enough
-KHOP_DYNAMIC
+HDRS_MISSP
 
 # good enough
-LOTTO_AGENT
+HK_NAME_MR_MRS
 
 # good enough
-MONEY_ATM_CARD
+HK_SCAM_N3
 
 # good enough
-NSL_RCVD_HELO_USER
+LIST_PARTIAL_SHORT_MSG
 
 # good enough
-SERGIO_SUBJECT_VIAGRA01
+MILLION_USD
 
 # good enough
-TO_EQ_FM_DIRECT_MX
+MONEY_BARRISTER
 
+# good enough
+NSL_RCVD_FROM_USER
+
+# good enough
+SHARE_50_50
+
+# good enough
+SHORTENED_URL_SRC
+
 # tflags publish
 UC_GIBBERISH_OBFU
 
+ echo 'Committing promotions in rules/active.list...'
+ svn commit -m 'promotions validated' rules/active.list
Committing promotions in rules/active.list...
Sendingrules/active.list
Transmitting file data .done
Committing transaction...
Committed revision 1842786.
+ /usr/bin/perl masses/rule-qa/list-bad-rules
++ date +%w
+ [[ 4 = 3 ]]
+ for VER in '$VERSIONS'
+ make_tarball_for_version 3.4.2
+ version=3.4.2
+ tmpdir=/usr/local/spamassassin/automc/tmp/stage/3.4.2
+ rm -rf /usr/local/spamassassin/automc/tmp/stage/3.4.2
+ mkdir -p /usr/local/spamassassin/automc/tmp/stage/3.4.2
+ make clean
rm -f \
  SpamAssassin.bso SpamAssassin.def \
  SpamAssassin.exp SpamAssassin.x \
   blib/arch/auto/Mail/SpamAssassin/extralibs.all \
  blib/arch/auto/Mail/SpamAssassin/extralibs.ld Makefile.aperl \
  *.a *.o \
  *perl.core MYMETA.json \
  MYMETA.yml blibdirs.ts \
  core core.*perl.*.? \
  core.[0-9] core.[0-9][0-9] \
  core.[0-9][0-9][0-9] core.[0-9][0-9][0-9][0-9] \
  core.[0-9][0-9][0-9][0-9][0-9] libSpamAssassin.def \
  mon.out perl \
  perl perl.exe \
  perlmain.c pm_to_blib \
  pm_to_blib.ts so_locations \
  tmon.out 
rm -rf \
  *.cache blib \
  doc pod2htm* \
  qmail rules/*.pm \
  rules/70_inactive.cf sa-awl \
  sa-check_spamd sa-compile \
  sa-learn sa-update \
  spamassassin spamc/*.cache \
  spamc/*.o* spamc/*.so \
  spamc/Makefile spamc/config.h \
  spamc/config.log spamc/config.status \
  spamc/qmail-spamc spamc/replace/*.o* \
  spamc/spamc spamc/spamc.h \
  spamc/version.h spamd/*spamc* \
  spamd/spamd t/bayessql.cf \
  t/do_net t/log \
  t/sql_based_whitelist.cf version.env 
mv Makefile Makefile.old > /dev/null 2>&1
+ /usr/bin/perl Makefile.PL 
PREFIX=/usr/local/spamassassin/automc/tmp/stage/3.4.2
What email address or URL should be used in the suspected-spam report
text for users who want more information on your filter installation?
(In particular, ISPs should change this to a local Postmaster contact)
default text: [the administrator of that system] the administrator of that 
system

NOTE: settings for "make test" are now controlled using "t/config.dist". 
See that file if you wish to customize what tests are run, and how.

checking module dependencies and their versions...

***