Dear R Help Community, I have a question and an answer (based on reading this forum and online research), but I though I should share both since probably there's a much better way to go about my solution. My question is specifically about how to best visualise multiple response contingency tables. What I mean by 'multiple response' is that the total number of responses per row of a contingency table will be greater than the total number of respondents. An example of a multiple response table shown below (apologies if my formatting is incorrect or silly, I'm a hardcore R newbie):
> f.tbl = structure(c(10, 15, 25, 45, 30, 50), .Dim = 2:3, .Dimnames = > structure(list(+ Sex = c("F", "M"), Responses = c("A", "B", "total > subjects"+ )), .Names = c("Sex", > "Responses")), class = "table")> f.tbl Responses Sex A B total subjects F 10 25 30 M 15 45 50 The answer I have is to adjust my data and then use the mosaic() function in package:vcd; however, I'm not sure that's the best way forward and I don't have a very efficient way of getting there. I will present my solution so you guys can take a look. The fundamental problem is that because of the multiple response data, you can't simply apply a normal Chi-square test to the contingency table. There's a raft of approaches, but I've decided to use a simple technique introduced by (A. Agresti, I. Liu, Modeling a categorical variable allowing arbitrarily many category choices, Biometrics 55 (1999) 936-43.) and refined by Thomas and Decady and Bilder and Loughin. In summary, the test statistic (a modified Chi square statistic) is calculated by summing up the individual chi-square statistics for each of the c marginal r × 2 tables relating the single response variable to the multiple response variable with df = c(r - 1)). Note, that instead of using the row totals (total number of responses) the test statistic is calculated with the total number of subjects per row. (phew, I hope that made sense :) ) Unfortunately, my google-research has not revealed an easy way to transform my one data table into c x r x 2 tables for analysis. So I end up having to create the two different tables myself, shown below (note that the Not-A/B columns are calculated as the difference between the main data column (A/B) and the total number of subjects listed above. > g.mtrx=matrix(c(10,15,20,35),nrow=2)> g.tbl=as.table(g.mtrx)> > dimnames(g.tbl)=list(Sex=c("F","M"),Responses=c("A","Not-A"))> g.tbl > Responses Sex A Not-A F 10 20 M 15 35 > h.tbl=as.table(h.mtrx)> h.mtrx=matrix(c(25,45,5,5),nrow=2)> > h.tbl=as.table(h.mtrx)> > dimnames(h.tbl)=list(Sex=c("F","M"),Responses=c("B","Not-B"))> h.tbl > Responses Sex B Not-B F 25 5 M 45 5 If I then preform the normal Chi-square test on each of the two tables (chisq.test()) and then sum up the results, I get the answer I want. Clearly this is cumbersome, which is why I do it in Excel at the moment (I know shame on me). However, I really want to take advantage of the mosaic function in vcd. So what I have to do at the moment is create the tables above and use abind() (package:abind) to bring my two matrices together to form a multidimensional matrix. Example: > gh.abind = abind(g.mtrx,h.mtrx,along=3)> > dimnames(gh.abind)=list(Sex=c("F","M"),Responses=c("Yes","No"),Factors=c("A","B"))> > gh.abind, , Factors = A Responses Sex Yes No F 10 20 M 15 35 , , Factors = B Responses Sex Yes No F 25 5 M 45 5 Now I can use the simple mosaic function to plot the combined matrix > mosaic(gh.abind) So that's it. I don't use any pearson-r shading in mosaic since I don't think it would be appropriate to try and model my weird multiple response tables (at the moment), but what I will do is look at the odds-ratio table and then manually colour the mosaic cells with high odds-ratios (greater than 2). I am literally having to type all this by hand into R, and as you can imagine, it gets cumbersome with large multi column tables (which I have). Does any body have any thoughts on my approach of using mosaic for this sort of data? And if so, any insight on how I can be a bit slicker with my R code? All help is appreciated and I hope that this question wasn't too long to read through. All the best, Marcos -- PhD Engineering Candidate University of Cambridge Department of Engineering Centre for Sustainable Development mp...@cam.ac.uk <mp...@cam.ac.uk> [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.