@Joe - Brilliant! Spot-on. This serves to reassure me immensely

But what are the odds of both Haubo and I being wrong?
(…just joking.)

Thanks for those refs. I was looking for a proper introductory
treatment of The Lady Tasting Tea (TLTT) problem, and couldn't find
one.
At least, not with real figures, applicable to practical problems of
human performance testing.
This led me to do my own experiments, essentially replicating what you
and Haubo have set down here.

This still leaves me with the problem of writing a Jwiki page
justifying it all in layperson's terms.
I think you'll agree that Haubo's introduction to the power of a test
is totally indigestible to a layperson, even if carefully teaspooned
in.
I'll see if my son Max and I can't work on that -- to manufacture some
statistical babyfood that's intellectually nourishing.

It's intriguing to realize that, if we confine ourselves to the cases
of 10 or 20 trials, the scoring algorithm is absurdly simple. Just a
lookup table for the number of trials. It's also intriguing to see how
14 successes out of 20 only just fails to reject H0 at the 5% level. I
knew experimental psychologists who used to get upset about that, and
wanted to stretch the point. They'd say: oh, 5% is an arbitrary
figure. Why not say it's 6% which lets us pass this borderline case?
University boards-of-studies (in the UK) have this sort of committee
discussion all the time about examination results.

But, IMO, a test is a test -- and if you fail, you fail. To go
changing the threshold *after the experiment* is like cheating at
Patience.

9 out of 10 --> pass
8 out of 10 --> fail
15 out of 20 --> pass
14 out of 20 --> fail …eh?
The weight of statistical authority underlying this simple scoring
system deserves to be more widely known.
(Or have I just uttered a blasphemous heresy?)

Ian

On Wed, Feb 25, 2015 at 12:52 PM, Joe Bogner <[email protected]> wrote:
> Ian, it looks like your implementation is correct
>
> Here is some reference material on the subject:
>
> [1] -
> http://www.cran.r-project.org/web/packages/sensR/vignettes/methodology.pdf
> [2] - http://orbit.dtu.dk/files/12270008/phd271_Rune_Haubo_net.pdf
>
>
> Here is a relevant section from [1]
>
> The power of a test is the probability of getting a significant result for
> a particular test given
> data, significance level and a particular difference. In other words, it is
> the probability of
> observing a difference that is actually there. Power and sample size
> calculations require that
> a model under the null hypothesis and a model under the alternative
> hypothesis are decided
> upon. The null model is often implied by the null hypothesis and is used to
> calculate the
> critical value. The alternative model has to lie under the alternative
> hypothesis and involves
> a subject matter choice. Power is then calculated for that particular
> choice of alternative
> model.
>
> In the following we will consider calculation of power and sample size
> based directly on the
> binomial distribution. Later we will consider calculations based on a
> normal approximation
> and based on simulations.
>
> ....
>
> Example: Consider the situation that X = 15 correct answers are observed
> out of n = 20
> trials in a duo-trio test. The exact binomial p-value of a no-difference
> test is P(X ≥ 15) =
> 1 − P(X ≤ 15 − 1) = 0.021, where X ∼ binom(0.5, 20) so this is significant.
> If on the other
> hand we had observed X = 14, then the p-value would have been P(X ≥ 14) =
> 0.058, which
> is not significant. We say that xc = 15 is the critical value for this
> particular test on the
> α = 5% significance level because xc = 15 is the smallest number of correct
> answers that
> renders the test significant.
> In R we can find the p-values with
>> 1 - pbinom(q = 15 - 1, size = 20, prob = 0.5)
> [1] 0.02069473
>> 1 - pbinom(q = 14 - 1, size = 20, prob = 0.5)
> [1] 0.05765915
>
>
> As a difference, J's binomialprob already seems to do the X-1...
>
>    binomialprob 0.5,20,15
> 0.0206947
>
>    binomialprob 0.5,20,14
> 0.0576591
>
> Using your methods:
>
> 20 pH0 15
> 0.0206947
>
> 20 pH0 14
> 0.0576591
>
> (20 rejectH0 15)
> 1 --> you can tell the difference
>
> (20 rejectH0 14)
> 0 --> you cannot tell the difference
>
> It seems to match the output of that reference paper and R implementation
>
>
>
>
> On Wed, Feb 25, 2015 at 1:39 AM, 'Pascal Jasmin' via Programming <
> [email protected]> wrote:
>
>> sorry did not test,
>>
>> (0.025 < 4 : 'binomialprob 0.5, x, (-:x) + | (-:x) - y' )
>>
>>
>>
>> ----- Original Message -----
>> From: 'Pascal Jasmin' via Programming <[email protected]>
>> To: "[email protected]" <[email protected]>
>> Cc:
>> Sent: Wednesday, February 25, 2015 1:36 AM
>> Subject: Re: [Jprogramming] Tea-tasting for non-statisticians
>>
>> binomialprob returns the probability that the number of successes will be
>> >: given amount.
>>
>> This is not quite the same as confidence intervals.  The latter (for 95%
>> CI) is that the number of successes is outside 2 standard deviations from
>> the mean (0.5)
>>
>>
>>
>> (0.025 < 4 : 'binomialprob 0.5, x, x + | x-y' )
>>
>> where 1 is reject.
>>
>>
>> ----- Original Message -----
>> From: Ian Clark <[email protected]>
>> To: [email protected]
>> Cc:
>> Sent: Tuesday, February 24, 2015 9:46 PM
>> Subject: [Jprogramming] Tea-tasting for non-statisticians
>>
>> Addon 'stats/base/distribution' defines the verb: binomialprob.
>> Am I using it correctly?
>> Please cast a beady eye over my train of thought, as I've set it out
>> below...
>>
>> I've written an app in J to administer a double-blind test which
>> reruns the classic experiment described by David Salsburg in "The Lady
>> Tasting Tea". But in place of pre- and post-lactary tea, I play a
>> snatch of one of two soundfiles in a series of 10 trials to ascertain
>> if she really can tell the difference.
>>
>> From her answers I compute 2 numbers:
>>    N=: number of trials (typically 10 or 20)
>>    s=: number of successes.
>> I have also set up an adjustable parameter:
>>    CONFIDENCE=: 0.95        NB. (the 95% confidence limit)
>>
>> Instead of using binomialprob directly, I define 2 verbs:
>>    pH0=: 4 : 'binomialprob 0.5,x,y'
>>    rejectH0=: 4 : '(1-CONFIDENCE) > x pH0 y'
>>
>>    (p=. N pH0 s) is the likelihood of s arising at random under the
>> "null hypothesis" (H0), viz that she's just guessing with
>> probability=0.5 of success.
>>    (N rejectH0 s) returns 1 iff p is too low, as determined by
>> CONFIDENCE, implying the null hypothesis (H0) can safely be rejected.
>> This Boolean value triggers one of 2 messages:
>>    1 --> You can tell the difference.
>>    0 --> You're just guessing.
>>
>> That's straining the epistemology, I know. But I don't expect a
>> non-statistician to make much sense of a statistically kosher message,
>> such as:
>>    0 --> This program has decided that the (null) hypothesis that your
>> results have arisen by pure guesswork cannot be safely rejected on the
>> evidence alone of these 10 trials.
>>
>> There's gratifying interest in the music/audiophile community in such
>> a sound-test, if it's packaged up and made easy-to-use. Questions
>> like: "can you hear any improvement if you use an optical cable
>> instead of a USB one?" come up all the time. And, as I've discovered
>> for myself, even a strong impression of improvement may not stand up
>> to this sort of scrutiny. It's the placebo effect.
>>
>> So not only do I need to assure the soundness of the statistical
>> theory and its J implementation, plus my use of it, but also publish a
>> proper write-up (in Jwiki) which makes sense to a non-statistician.
>> This after all is *the* foundation experiment in the history of
>> Statistics.
>> But AFAICS its treatment in Wikipedia leaves much to be desired.
>> See for instance: https://en.wikipedia.org/wiki/Binomial_test
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to