On Fri, Jan 14, 2022 at 10:36 PM Mark Miller <[email protected]> wrote:
> The true nature and state of those tests lie far deeper than pretty much > anyone occasionally scratches with their trowel. To really take a peak, you > have to do at minimum, something like setup a Jenkins farm with half a > dozen, a dozen machines with varying low to high need specs, randomize > parallel overlap and test order and actually shake the Jenga tower to see > what falls out. > Yeah, Random variation is a big enemy here. I once took the time to figure out what number of tests is required to generate enough pass fail data points to detect (with statistics) if the pass fail rate has changed by something like 25%... winds up being over 100 test runs before *and* after the change. The only thing detectable (with p=0.95 significance) in 10 runs is like going from 90% fail to 0% fail. Anything less and you don't have the statistical power necessary. That's why 1/10 fails had me slightly hopeful.... IIRC what I found something like that might be significant at p=0.6 or so... i.e. > 50% chance that something changed. (didn't actually calculate that specific case) of not being a random event. .... But since my last mail I've run curent main 5 times and got 4 fails so random is looking like the answer. > > That will expose a real view rather then a narrow slit into a shifting, > opaque, but “relatively balancing from a view point”perspective at least > view. > > Just from a practical squeeze, many projects just push on narrowing that > slit view and leaning into more efforts on keeping the structure balanced > in that far. Perhaps going as far as, run in a known Docker environment, > minimize disturbances and test recording with light parallel at most and > even, just a master Jenkins run is the real deal, developers, your luck > will vary, outside of adventuring, you’ll have an easier time letting the > test source of truth Jenkins instance dictate your hat fails or not. > My thinking is that laptop sized runs are what we care about, and failure with excess resources is less likely. So Running on smaller machines with high parallelism to simulate a developer trying to get through the build as fast as their laptop will go would be my focus. -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
