Hello,

I routinely use aov and and the Error term to perform analyses of variance of experiments with 'within-subject' factors. I wonder whether a notion like 'multistratum models' exists for glm models when performing a logit analysis (without being 100% sure whether this would make sense).

I have data of an experiment where the outcome is a categorical variable:

20 individuals listened to 80 synthetic utterances (distributed in 4 types) and were ask classify them into four categories. (The variables in the data.frame are 'subject', 'sentence', 'type', and 'response')

Here is the table of counts table(type,response):

      response
type  a   b  c   d
 a 181 166 42  11
 b  69 170 72  89
 c  90 174 75  61
 d  14 125 53 208


There are several questions of interest, such as, for example:


- are responses distibuted in the same way for the different types?

- are the numbers of 'a' responses for the 'b' and 'c' types significantly different?

- is the proportion of 'd' over 'a' responses different for the 'b' and 'c' categories?

...

(I want to make inferences for the population of potential subjects on the one hand, and on the population of potential sentences on the other hand).

If the responses were continuous, I would just run two one-way anovas: one with the factor type over the means by subject*type,
and the other with the factor type over the means by sentences (in type). And use t.test to compare between different pairs of types.


Now, as the answers are categorical, I am not sure about the correct approach and how to use R to perform such an analysis.

I could treat response as a factor, and use percentages of responses per subject in each cell of response*type,
and run an anova on that...[ aov(percentage~response*type+Error(subject/(response*type))] But it seems incorrect to me to use the response of the subject as an independent variable (though I do not have a forceful argument).


Simple Chi-square tests are not the answer either, as a given subject contributed several times (80) to the counts in the table above.

My reading of MASS and of several other books suggest the use of logit/multinomial models when the response is categorical. But in all the examples provided, the units of analysis contribute only one measurement. Should I include the subject and sentences factors in the formula? But then they would be treated as fixed-factors in the analysis, would they not?


Any suggestion is welcome.


Christophe Pallier
www.pallier.org

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to