Re: masscheck process timing

Kevin Golding Mon, 01 Aug 2016 01:22:44 -0700

On Sun, 31 Jul 2016 22:00:11 +0100, John Hardin <[email protected]> wrote:

Folks:
It looks like we didn't get another successful weekly masscheck again,even though if you check the counts today they are above the thresholds.
I suspect this is happening due to some results being submitted "late".
I think we might want to look into making a change to the masschecktiming rules, specifically: the cutoff for having enough corpora to runthe scoring and produce a rules update is not a specific time, but isinstead related to the following masscheck run.

Change is good. I must admit it's getting a bit frustrating seeing theseruns result in nothing at all.

In other words:
There is still a cutoff time for the masscheck run, but it only means"the scoring won't start prior to this time."
If the corpora are above the thresholds when this time is reached, thescoring and update process commences immediately.
If not, that doesn't mean we've missed an update, at least not yet.
If another result set comes in for that pass, and that result set pushesit over the thresholds, then we can start the scoring and rulegeneration process.
The actual hard cutoff for pass X would be sometime after pass X+1starts. Perhaps if the cutoff time for pass X+1 is reached and pass X isstill waiting, then we give up on pass X.
This way, a late result set that satisfies the threshholds will justdelay the rule generation, not prevent it.

I like that there's an effort to still push the updates out as early inthe day as possible with this system. The simplest option is no doubt tojust delay the score generation, even to the point of giving a whole 24hours if need be, as it would at least result in something fairlyreliable. This seems a good hybrid approach though.

This can use some refinement:

Some good thoughts, but ones that I fear may prove an obstacle to gettinga change in place. Perhaps things for a wishlist instead?

If we've started scoring and another result set for that pass comes in,do we incorporate that into the score generation? We probably should;the decision could be based on when the delayed results come in (wedon't want to keep resetting the scoring process and collide with thefollowing pass) and how large the new results are (we might want toignore a late small result set, but incorporate a late large result set).

As it stands I'm inclined to take the route that anything submitted afterthe run has started gets lost - this is no different to the currentsituation (as I understand it anyway) so it's not penalising anyone, butit also doesn't grant further concessions. Adding in new results justseems a way to potentially further delay an already delayed process.

Much as the additional data is beneficial it seems added complexity for nogain. Given how tight the ham threshold is most days (there are a lot ofdays in the 140k-150k region) a large result set is unlikely to arriveafter the threshold has been met anyway, it's far more likely to be thetrigger. If we start dividing large and small we need to pick a point anddraw a line and potentially discourage submissions from people who feelthey aren't important enough.

I'd also note that when you look at the uploads you have people like axbwho submit multiple times in small groups - that is always an option topeople if they feel something is important enough to beat the threshold.

If we're still running a score generation for pass X and pass X+1 hasreached its cutoff and has enough corpora to satisfy the thresholds andimmediately start the scoring process, do we give up on processing passX? I would think yes.

I don't know how long the process takes, but if we never start a pass bythe time the next day's start point comes I would assume it would neveroverlap. I could be wrong, but it seems likely that a hard cut off thatshouldn't overlap the next day's start may be simpler. At some point weneed to give up hope on a day's results anyway, so that may be theguideline for when that time is.

Re: masscheck process timing

Reply via email to