Hi All,
This is a kind-a technical question, but I hope that the more statistically versed participants can help me ... (BTW: references to tests are to R tests, the names of the tests indicate their puropse ...) Lets say I have a hypothesis (H0) about the proportions of colored balls in 3 color classes (e.g., green (G), white (W), and blue (B) balls) in a population of balls (ie, my H0 is not that the proportion of balls in the classes is the same for all classes, but my H0 is that there are certain different porportions for the classes). Further, I have one observed sample of balls from this population. I have 3 goals: i) falsify my hypothesis about the proportions of the classes based on the sample, ii) indicate which of the observed color classes might cause a rejection, iii) obtain confidence intervals for the true proportions of the classes in the population (to be used in the generation of a new hypothesis in the case of rejection). I thought of 2 possible approaches: i) I perform an initial chisq.test on the original 3 color classes for falsifying the overall hypothesis (satisfying goal i). In case of rejection, I lump the data into 2 classes (=one of the original color classes and the remaining 2 classes lumped into one new class) and do a binom.test. I interpret the result of the binom.test as indicating whether the current original class might be the reason for the rejection of the overall H0 (goal ii). Additionally, the binom.test gives me a confidence envelope for this class (goal iii). ii) I follow a Monte Carlo-like approach: I simulate proportions for the 3 classes based on the proportions of observed counts with rmultinom (random sample of a multinomial distr). I construct empirical (simulated) confidence envelopes and use them to falsify the overall hypothesis: if one of the hypothesized (H0) proportions falls outside the corresponding empirical confidence envelope, I reject the H0. The result simultaneously satisfies all 3 goals (see above). Both approaches make me feel uneasy: i) Repeated application of the binom.test in approach i) seems to cause a multiple testing problem. However, I might be able to deal with this with a test correction, e.g. an adjusted Bonferroni. ii) The mutual dependence of the proportions in the 3 classes might mean that I cannot construct an independent confidence envelope for each class in approach ii). Instead, I have a multi-dimensional confidence region of some odd shape b/o the dependence of the classes. However, this might be taken care of by the simulations since I do not make any assumption about the joint distribution but actually describe the real joint distribution empirically with the simulations. iii) I feel that my approach ii) (MonteCarlo-like) might be too naive. Could a randomization test really be that 'easy'? Or do I have to use a more sophisticated test statistic (eg, chisq) for the multinomial and construct an empirical distribution for that statistic? I do feel that I might be overlooking some important points here, especially with approach ii), but I dont know which. Your comments and suggestions would be highly appreciated. Even if you could just point me to the right document, that would be great. Kind regards, Michael Drescher ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html