Masscheck recovery from corpus starvation

John Hardin Tue, 14 Apr 2026 20:19:49 -0700

Some code review of the RuleQA system shows that the way it is designed itwill take a week to recover from a corpus starvation problem that affectsthe weekly net checks.

Masscheck runs both net and non-net scoring every day, looking back 7 daysto get the last network masscheck results. RuleQA is currently notpublishing rules because it is hung up on last friday's net masscheck thatwas ham-starved due to llanga's scheduling problem.


This will recover on its own by the end of the week.

Proposed code changes to bypass a starved network masscheck and reuse thecurrent network set scores are attached.



--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]                         pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  People that keep dreaming about the wasteland, labyrinths and
  quick cash, die in amusing ways.                 -- Root the Dragon
-----------------------------------------------------------------------
 Today: the 161st anniversary of Lincoln's assassination

# Masscheck Rescore Failure Analysis and Proposed Fix

## Background

The nightly rescore pipeline on sa-vm runs as a single cron job at 02:25 UTC daily:

```
/usr/local/spamassassin/automc/svn/trunk/build/mkupdates/do-stable-update-with-scores force
```

This calls `do-nightly-rescore-example.sh force`, which in turn calls
`generate-new-scores.sh` twice â once for each of two scoresets â then merges
the results and commits to SVN. If the commit succeeds, `mkupdate-with-scores`
builds and publishes the signed update tarball.

## Scoreset Definitions

| Set | Description | Log pattern | mtime selection window |
|-----|-------------|-------------|------------------------|
| 0 | Non-net, non-bayes | `ham-*.log` / `spam-*.log` | `-2` days |
| 1 | Net-enabled | `ham-net-*.log` / `spam-net-*.log` | `-7` days |
| 2 | Bayes-enabled | (copied from set 0) | â |
| 3 | Bayes + net | (copied from set 1) | â |

Contributors run `nightly_mass_check` (no `--net`) daily and `weekly_mass_check`
(`--net`) on Saturdays only, controlled by `automasscheck-minimal.sh`. The
`mtime` windows are what actually control log selection at scoring time;
contributor upload timing determines which logs fall within those windows.

## The Failure Mode

### Root Cause: `set -e` with Sequential Scoreset Runs

`do-nightly-rescore-example.sh` uses `set -e` and runs both scoresets
sequentially, with set 1 first on non-Sunday days:

```bash
set -e
...
else
  echo 'Not Beginning of Week.  Running with 1 first.'
  $PROGDIR/generate-new-scores.sh 1 $1   # <-- runs first MonâSat
  $PROGDIR/generate-new-scores.sh 0 $1   # <-- never reached if set 1 fails
  SCORESET=0
fi
```

`generate-new-scores.sh` exits with a nonzero code when corpus thresholds are
not met, e.g. exit 8 for insufficient ham message count:

```bash
if [[ "$HAMCOUNT" -lt "$MINHAMCOUNT" ]]; then
  echo "Insufficient ham corpus to generate scores; aborting."
  exit 8
fi
```

Because `set -e` is active, this nonzero exit immediately terminates
`do-nightly-rescore-example.sh`. **Set 0 never runs.**

### Why Set 1 Blocks Set 0

Set 1 fails with a logged "Insufficient ham corpus" message. Set 0 contributors
run daily and are never given the chance to run when set 1 fails.

### Cascading Effect

When set 1 fails:

1. `generate-new-scores.sh 1` exits nonzero
2. `set -e` kills `do-nightly-rescore-example.sh`
3. `do-stable-update-with-scores` sees a nonzero exit and sends a failure email
4. `mkupdate-with-scores` is never called
5. No rule update is published â including any bugfixes committed to SVN

The failure is silent from a set-0 perspective: the cron email shows only the
set-1 output, giving no indication of whether set 0 would have succeeded.

## Observed Instance

On Friday 11 April 2026, the run failed with:

```
HAM: 66682 (100000 required)
Insufficient ham corpus to generate scores; aborting.
```

This was set 1 (net). Set 0 was never attempted.

## Proposed Fix

Three files require changes and one new file is introduced.

### 1. `masses/rule-update-score-gen/do-nightly-rescore-example.sh`

**Proactively** check out the existing `72_scores.cf` and extract per-set
fallback scores at the very start of the script, before any scoring runs.
This avoids SVN I/O in the error path and provides a fallback REVISION value
without conditional logic later.

Then run both scoresets independently, capturing exit codes. If one fails,
use the pre-extracted fallback scores for that set. Abort only if both sets
fail.

```bash
# --- Proactive fallback: fetch current scores before anything can fail ---
svn co https://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/scores \
  trunk-rulesrc-scores

# Extract per-set fallback scores from the existing 72_scores.cf.
# These will be used if a scoreset run fails.
for SET in 0 1 2 3; do
  $PROGDIR/extract-scoreset trunk-rulesrc-scores/72_scores.cf $SET \
    > scores-set${SET}-fallback
done

# Fallback REVISION: the SVN revision of the existing scores checkout.
# Will be overridden below if fresh scores are generated successfully.
FALLBACK_REVISION=`svn info trunk-rulesrc-scores | \
  awk '/^Last Changed Rev:/ {print $4}'`

# --- Run both scoresets independently ---
set +e

$PROGDIR/generate-new-scores.sh 1 $1
RC1=$?
$PROGDIR/generate-new-scores.sh 0 $1
RC0=$?

set -e

if [[ $RC1 -ne 0 && $RC0 -ne 0 ]]; then
  echo "Both scoresets failed (set0 exit $RC0, set1 exit $RC1), aborting."
  exit 1
fi

# Apply fallback scores for any failed set and record which sets used fallback.
PARTIAL_NOTE=""

if [[ $RC1 -ne 0 ]]; then
  echo "Set 1 scoring failed (exit $RC1); carrying forward existing scores."
  cp scores-set1-fallback scores-set1
  cp scores-set3-fallback scores-set3
  PARTIAL_NOTE="$PARTIAL_NOTE [set1 carried forward: exit $RC1]"
fi

if [[ $RC0 -ne 0 ]]; then
  echo "Set 0 scoring failed (exit $RC0); carrying forward existing scores."
  cp scores-set0-fallback scores-set0
  cp scores-set2-fallback scores-set2
  PARTIAL_NOTE="$PARTIAL_NOTE [set0 carried forward: exit $RC0]"
fi
```

The `cp scores-setN scores-set{N+2}` steps for sets 2 and 3 only apply when
the corresponding set was freshly scored (fallback already handled sets 2 and 3
above):

```bash
[[ ! -f scores-set2 || scores-set0 -nt scores-set2 ]] && cp scores-set0 scores-set2
[[ ! -f scores-set3 || scores-set1 -nt scores-set3 ]] && cp scores-set1 scores-set3
```

The REVISION and commit message use the fallback value if the active scoreset
failed, and include a note for any partial rescore:

```bash
REVISION=`grep "revision .*" scores-set$SCORESET | cut -d" " -f9`
[[ -z "$REVISION" ]] && REVISION=$FALLBACK_REVISION

svn ci trunk-rulesrc-scores/ \
  -m "updated scores for revision $REVISION active rules added since last mass-check${PARTIAL_NOTE}"
```

The existing `trunk-rulesrc-scores` checkout (done proactively above) is reused
for the commit â no second checkout is needed.

### 2. New helper: `masses/rule-update-score-gen/extract-scoreset`

A small Perl script to extract one scoreset column from an existing
`72_scores.cf` and emit `score RULENAME VALUE` lines suitable for use as a
`scores-setN` file.

`72_scores.cf` as produced by `merge-scoresets` always has exactly 4 score
values per rule. The script asserts this and dies on any deviation.

```perl
#!/usr/bin/perl
use strict;
use warnings;

# extract-scoreset - extract one scoreset column from 72_scores.cf
# usage: extract-scoreset 72_scores.cf <set#>
#
# Emits "score RULENAME VALUE" lines for the requested scoreset.
# Input is the output of merge-scoresets, which always has exactly 4 values
# per score line:
#   score RULE s0 s1 s2 s3

my ($file, $set) = @ARGV;
die "usage: extract-scoreset <72_scores.cf> <0|1|2|3>\n"
    unless defined $set && $set =~ /^[0-3]$/;

open(my $fh, '<', $file) or die "Cannot open $file: $!";
while (<$fh>) {
    next unless /^score\s+(\S+)\s+(.*)/;
    my ($name, $rest) = ($1, $2);
    $rest =~ s/\s*#.*//;   # strip comments
    my @scores = split(' ', $rest);
    die "expected 4 score values for '$name', got " . scalar(@scores) . ": $_"
        unless @scores == 4;
    printf "score %s %s\n", $name, $scores[$set];
}
close $fh;
```

### 3. `masses/rule-update-score-gen/merge-scoresets` (hardening)

Change the `die` on a missing scores file to abort with an explicit message
rather than silently producing a corrupted merge output. A missing file at this
stage means the fallback in `do-nightly-rescore-example.sh` failed to produce
the expected file â aborting is correct:

```perl
  if (!-f "scores-set$i") {
    die "scores-set$i not found and no fallback was generated; aborting merge.\n";
  }
  open (SCORES, "scores-set$i") or die "Cannot open scores-set$i: $!";
```

Publishing a merge with a missing set defaulted to 0.001 throughout would
silently disable net-enabled rules â the correct response is to abort and
let the operator investigate.

## Summary of Changes

| File | Change |
|------|--------|
| `masses/rule-update-score-gen/do-nightly-rescore-example.sh` | Proactive SVN checkout; run both sets independently; apply pre-fetched fallback on individual set failure; abort only if both fail; accurate commit message |
| `masses/rule-update-score-gen/extract-scoreset` (new) | Extract one scoreset column from existing `72_scores.cf`; asserts 4-value format |
| `masses/rule-update-score-gen/merge-scoresets` | Die with clear message on missing scores file rather than silently skipping |

## Notes

- The nightly run on sa-vm does `svn up masses/` before executing, so
  committed changes take effect the following night automatically.
- The proactive SVN checkout approach avoids conditional SVN I/O in the error
  path and ensures the fallback REVISION is always available.
- The `extract-scoreset` fallback preserves the last known-good scores for the
  failing set rather than zeroing rules or defaulting to 0.001.
- This fix does not address the underlying cause of set 1 corpus failures, but
  prevents them from blocking set 0 publication and allows rule bugfixes to go
  out regardless.

Masscheck recovery from corpus starvation

Reply via email to