[
https://issues.apache.org/jira/browse/CRUNCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629286#comment-14629286
]
Gabriel Reid commented on CRUNCH-542:
-------------------------------------
FWIW, I think that just using the seeded version of the test is fine (that's
what is done in o.a.c.lib.SampleTest). Checking that it's within 5 standard
deviations isn't that far away from not checking it at all isn't it?
Another option might be to do three un-seeded calls to sample and then
calculate the average.
> Wider tolerance for flaky scrunch PCollectionTest
> -------------------------------------------------
>
> Key: CRUNCH-542
> URL: https://issues.apache.org/jira/browse/CRUNCH-542
> Project: Crunch
> Issue Type: Improvement
> Components: Scrunch
> Affects Versions: 0.10.0, 0.11.0, 0.12.0
> Reporter: Josh Wills
> Priority: Minor
> Fix For: 0.13.0
>
> Attachments: CRUNCH-542.patch
>
>
> One of the Scrunch tests uses an unseeded version of the sample() function
> that verifies that it works correctly by ensuring that an actual sampling of
> elements is within ~ 3 standard deviations of the expected value. Given this,
> we expect the test to fail about once every 370 times it is run, or once a
> year if the tests were run every day.
> My issue is that we test about a dozen versions of Crunch automatically in
> Jenkins every day, and so I'm having this test fail on at least some version
> about once every month. I'd like to bump the control limit up to a little
> over 5 standard deviations so that the test fails around once every
> millennium and/or get rid of the test entirely and only rely on the seeded
> versions of the test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)