Some code review of the RuleQA system shows that the way it is designed it will take a week to recover from a corpus starvation problem that affects the weekly net checks.
Masscheck runs both net and non-net scoring every day, looking back 7 days to get the last network masscheck results. RuleQA is currently not publishing rules because it is hung up on last friday's net masscheck that was ham-starved due to llanga's scheduling problem.
This will recover on its own by the end of the week.Proposed code changes to bypass a starved network masscheck and reuse the current network set scores are attached.
-- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ [email protected] pgpk -a [email protected] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- People that keep dreaming about the wasteland, labyrinths and quick cash, die in amusing ways. -- Root the Dragon ----------------------------------------------------------------------- Today: the 161st anniversary of Lincoln's assassination
# Masscheck Rescore Failure Analysis and Proposed Fix ## Background The nightly rescore pipeline on sa-vm runs as a single cron job at 02:25 UTC daily: ``` /usr/local/spamassassin/automc/svn/trunk/build/mkupdates/do-stable-update-with-scores force ``` This calls `do-nightly-rescore-example.sh force`, which in turn calls `generate-new-scores.sh` twice â once for each of two scoresets â then merges the results and commits to SVN. If the commit succeeds, `mkupdate-with-scores` builds and publishes the signed update tarball. ## Scoreset Definitions | Set | Description | Log pattern | mtime selection window | |-----|-------------|-------------|------------------------| | 0 | Non-net, non-bayes | `ham-*.log` / `spam-*.log` | `-2` days | | 1 | Net-enabled | `ham-net-*.log` / `spam-net-*.log` | `-7` days | | 2 | Bayes-enabled | (copied from set 0) | â | | 3 | Bayes + net | (copied from set 1) | â | Contributors run `nightly_mass_check` (no `--net`) daily and `weekly_mass_check` (`--net`) on Saturdays only, controlled by `automasscheck-minimal.sh`. The `mtime` windows are what actually control log selection at scoring time; contributor upload timing determines which logs fall within those windows. ## The Failure Mode ### Root Cause: `set -e` with Sequential Scoreset Runs `do-nightly-rescore-example.sh` uses `set -e` and runs both scoresets sequentially, with set 1 first on non-Sunday days: ```bash set -e ... else echo 'Not Beginning of Week. Running with 1 first.' $PROGDIR/generate-new-scores.sh 1 $1 # <-- runs first MonâSat $PROGDIR/generate-new-scores.sh 0 $1 # <-- never reached if set 1 fails SCORESET=0 fi ``` `generate-new-scores.sh` exits with a nonzero code when corpus thresholds are not met, e.g. exit 8 for insufficient ham message count: ```bash if [[ "$HAMCOUNT" -lt "$MINHAMCOUNT" ]]; then echo "Insufficient ham corpus to generate scores; aborting." exit 8 fi ``` Because `set -e` is active, this nonzero exit immediately terminates `do-nightly-rescore-example.sh`. **Set 0 never runs.** ### Why Set 1 Blocks Set 0 Set 1 fails with a logged "Insufficient ham corpus" message. Set 0 contributors run daily and are never given the chance to run when set 1 fails. ### Cascading Effect When set 1 fails: 1. `generate-new-scores.sh 1` exits nonzero 2. `set -e` kills `do-nightly-rescore-example.sh` 3. `do-stable-update-with-scores` sees a nonzero exit and sends a failure email 4. `mkupdate-with-scores` is never called 5. No rule update is published â including any bugfixes committed to SVN The failure is silent from a set-0 perspective: the cron email shows only the set-1 output, giving no indication of whether set 0 would have succeeded. ## Observed Instance On Friday 11 April 2026, the run failed with: ``` HAM: 66682 (100000 required) Insufficient ham corpus to generate scores; aborting. ``` This was set 1 (net). Set 0 was never attempted. ## Proposed Fix Three files require changes and one new file is introduced. ### 1. `masses/rule-update-score-gen/do-nightly-rescore-example.sh` **Proactively** check out the existing `72_scores.cf` and extract per-set fallback scores at the very start of the script, before any scoring runs. This avoids SVN I/O in the error path and provides a fallback REVISION value without conditional logic later. Then run both scoresets independently, capturing exit codes. If one fails, use the pre-extracted fallback scores for that set. Abort only if both sets fail. ```bash # --- Proactive fallback: fetch current scores before anything can fail --- svn co https://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/scores \ trunk-rulesrc-scores # Extract per-set fallback scores from the existing 72_scores.cf. # These will be used if a scoreset run fails. for SET in 0 1 2 3; do $PROGDIR/extract-scoreset trunk-rulesrc-scores/72_scores.cf $SET \ > scores-set${SET}-fallback done # Fallback REVISION: the SVN revision of the existing scores checkout. # Will be overridden below if fresh scores are generated successfully. FALLBACK_REVISION=`svn info trunk-rulesrc-scores | \ awk '/^Last Changed Rev:/ {print $4}'` # --- Run both scoresets independently --- set +e $PROGDIR/generate-new-scores.sh 1 $1 RC1=$? $PROGDIR/generate-new-scores.sh 0 $1 RC0=$? set -e if [[ $RC1 -ne 0 && $RC0 -ne 0 ]]; then echo "Both scoresets failed (set0 exit $RC0, set1 exit $RC1), aborting." exit 1 fi # Apply fallback scores for any failed set and record which sets used fallback. PARTIAL_NOTE="" if [[ $RC1 -ne 0 ]]; then echo "Set 1 scoring failed (exit $RC1); carrying forward existing scores." cp scores-set1-fallback scores-set1 cp scores-set3-fallback scores-set3 PARTIAL_NOTE="$PARTIAL_NOTE [set1 carried forward: exit $RC1]" fi if [[ $RC0 -ne 0 ]]; then echo "Set 0 scoring failed (exit $RC0); carrying forward existing scores." cp scores-set0-fallback scores-set0 cp scores-set2-fallback scores-set2 PARTIAL_NOTE="$PARTIAL_NOTE [set0 carried forward: exit $RC0]" fi ``` The `cp scores-setN scores-set{N+2}` steps for sets 2 and 3 only apply when the corresponding set was freshly scored (fallback already handled sets 2 and 3 above): ```bash [[ ! -f scores-set2 || scores-set0 -nt scores-set2 ]] && cp scores-set0 scores-set2 [[ ! -f scores-set3 || scores-set1 -nt scores-set3 ]] && cp scores-set1 scores-set3 ``` The REVISION and commit message use the fallback value if the active scoreset failed, and include a note for any partial rescore: ```bash REVISION=`grep "revision .*" scores-set$SCORESET | cut -d" " -f9` [[ -z "$REVISION" ]] && REVISION=$FALLBACK_REVISION svn ci trunk-rulesrc-scores/ \ -m "updated scores for revision $REVISION active rules added since last mass-check${PARTIAL_NOTE}" ``` The existing `trunk-rulesrc-scores` checkout (done proactively above) is reused for the commit â no second checkout is needed. ### 2. New helper: `masses/rule-update-score-gen/extract-scoreset` A small Perl script to extract one scoreset column from an existing `72_scores.cf` and emit `score RULENAME VALUE` lines suitable for use as a `scores-setN` file. `72_scores.cf` as produced by `merge-scoresets` always has exactly 4 score values per rule. The script asserts this and dies on any deviation. ```perl #!/usr/bin/perl use strict; use warnings; # extract-scoreset - extract one scoreset column from 72_scores.cf # usage: extract-scoreset 72_scores.cf <set#> # # Emits "score RULENAME VALUE" lines for the requested scoreset. # Input is the output of merge-scoresets, which always has exactly 4 values # per score line: # score RULE s0 s1 s2 s3 my ($file, $set) = @ARGV; die "usage: extract-scoreset <72_scores.cf> <0|1|2|3>\n" unless defined $set && $set =~ /^[0-3]$/; open(my $fh, '<', $file) or die "Cannot open $file: $!"; while (<$fh>) { next unless /^score\s+(\S+)\s+(.*)/; my ($name, $rest) = ($1, $2); $rest =~ s/\s*#.*//; # strip comments my @scores = split(' ', $rest); die "expected 4 score values for '$name', got " . scalar(@scores) . ": $_" unless @scores == 4; printf "score %s %s\n", $name, $scores[$set]; } close $fh; ``` ### 3. `masses/rule-update-score-gen/merge-scoresets` (hardening) Change the `die` on a missing scores file to abort with an explicit message rather than silently producing a corrupted merge output. A missing file at this stage means the fallback in `do-nightly-rescore-example.sh` failed to produce the expected file â aborting is correct: ```perl if (!-f "scores-set$i") { die "scores-set$i not found and no fallback was generated; aborting merge.\n"; } open (SCORES, "scores-set$i") or die "Cannot open scores-set$i: $!"; ``` Publishing a merge with a missing set defaulted to 0.001 throughout would silently disable net-enabled rules â the correct response is to abort and let the operator investigate. ## Summary of Changes | File | Change | |------|--------| | `masses/rule-update-score-gen/do-nightly-rescore-example.sh` | Proactive SVN checkout; run both sets independently; apply pre-fetched fallback on individual set failure; abort only if both fail; accurate commit message | | `masses/rule-update-score-gen/extract-scoreset` (new) | Extract one scoreset column from existing `72_scores.cf`; asserts 4-value format | | `masses/rule-update-score-gen/merge-scoresets` | Die with clear message on missing scores file rather than silently skipping | ## Notes - The nightly run on sa-vm does `svn up masses/` before executing, so committed changes take effect the following night automatically. - The proactive SVN checkout approach avoids conditional SVN I/O in the error path and ensures the fallback REVISION is always available. - The `extract-scoreset` fallback preserves the last known-good scores for the failing set rather than zeroing rules or defaulting to 0.001. - This fix does not address the underlying cause of set 1 corpus failures, but prevents them from blocking set 0 publication and allows rule bugfixes to go out regardless.
