Hi All,

This is a kind-a technical question, but I hope that the more statistically 
versed participants can help me ... (BTW: references to tests are to R tests, 
the names of the tests indicate their puropse ...)


Let’s say I have a hypothesis (H0) about the proportions of colored balls in 3 
color classes (e.g., green (G), white (W), and blue (B) balls) in a population 
of balls (ie, my H0 is not that the proportion of balls in the classes is the 
same for all classes, but my H0 is that there are certain different porportions 
for the classes). Further, I have one observed sample of balls from this 
population. I have 3 goals:

i) falsify my hypothesis about the proportions of the classes based on the 
sample,

ii) indicate which of the observed color classes might cause a rejection,

iii) obtain confidence intervals for the true proportions of the classes in the 
population (to be used in the generation of a new hypothesis in the case of 
rejection).
  

I thought of 2 possible approaches:

i) I perform an initial chisq.test on the original 3 color classes for 
falsifying the overall hypothesis (satisfying goal i). In case of rejection, I 
lump the data into 2 classes (=one of the original color classes and the 
remaining 2 classes lumped into one new class) and do a binom.test. I interpret 
the result of the binom.test as indicating whether the current original class 
might be the reason for the rejection of the overall H0 (goal ii). 
Additionally, the binom.test gives me a confidence envelope for this class 
(goal iii).

ii) I follow a Monte Carlo-like approach: I simulate proportions for the 3 
classes based on the proportions of observed counts with rmultinom (random 
sample of a multinomial distr). I construct empirical (simulated) confidence 
envelopes and use them to falsify the overall hypothesis: if one of the 
hypothesized (H0) proportions falls outside the corresponding empirical 
confidence envelope, I reject the H0. The result simultaneously satisfies all 3 
goals (see above).


Both approaches make me feel uneasy:

i) Repeated application of the binom.test in approach i) seems to cause a 
multiple testing problem. However, I might be able to deal with this with a 
test correction, e.g. an adjusted Bonferroni.

ii) The mutual dependence of the proportions in the 3 classes might mean that I 
cannot construct an independent confidence envelope for each class in approach 
ii). Instead, I have a multi-dimensional ‘confidence region’ of some odd shape 
b/o the dependence of the classes. However, this might be taken care of by the 
simulations since I do not make any assumption about the joint distribution but 
actually describe the real joint distribution empirically with the simulations.

iii) I feel that my approach ii) (MonteCarlo-like) might be too naive. Could a 
randomization test really be that 'easy'? Or do I have to use a more 
sophisticated test statistic (eg, chisq) for the multinomial and construct an 
empirical distribution for that statistic?
 
 
I do feel that I might be overlooking some important points here, especially 
with approach ii), but I don’t know which. Your comments and suggestions would 
be highly appreciated. Even if you could just point me to the right document, 
that would be great.
 
 
Kind regards, Michael Drescher


      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 

Reply via email to