Dear R Help Community,

I have a question and an answer (based on reading this forum and online
research), but I though I should share both since probably there's a much
better way to go about my solution. My question is specifically about how
to best visualise multiple response contingency tables. What I mean by
'multiple response' is that the total number of responses per row of a
contingency table will be greater than the total number of respondents. An
example of a multiple response table shown below (apologies if my
formatting is incorrect or silly, I'm a hardcore R newbie):

> f.tbl = structure(c(10, 15, 25, 45, 30, 50), .Dim = 2:3, .Dimnames = 
> structure(list(+     Sex = c("F", "M"), Responses = c("A", "B", "total 
> subjects"+                                      )), .Names = c("Sex", 
> "Responses")), class = "table")> f.tbl   Responses
Sex  A  B total subjects
  F 10 25             30
  M 15 45             50


The answer I have is to adjust my data and then use the mosaic() function
in package:vcd; however, I'm not sure that's the best way forward and I
don't have a very efficient way of getting there. I will present my
solution so you guys can take a look.

The fundamental problem is that because of the multiple response data, you
can't simply apply a normal Chi-square test to the contingency table.
There's a raft of approaches, but I've decided to use a simple technique
introduced by (A. Agresti, I. Liu, Modeling a categorical variable allowing
arbitrarily many category choices, Biometrics 55 (1999) 936-43.) and
refined by Thomas and Decady and Bilder and Loughin. In summary, the test
statistic (a modified Chi square statistic) is calculated by summing up the
individual chi-square statistics for each of the c marginal r × 2 tables
relating the single response variable to the multiple response variable
with df = c(r - 1)). Note, that instead of using the row totals (total
number of responses) the test statistic is calculated with the total number
of subjects per row.

(phew, I hope that made sense :) ) Unfortunately, my google-research has
not revealed an easy way to transform my one data table into c x r x 2
tables for analysis. So I end up having to create the two different tables
myself, shown below (note that the Not-A/B columns are calculated as the
difference between the main data column (A/B) and the total number of
subjects listed above.

> g.mtrx=matrix(c(10,15,20,35),nrow=2)> g.tbl=as.table(g.mtrx)> 
> dimnames(g.tbl)=list(Sex=c("F","M"),Responses=c("A","Not-A"))> g.tbl   
> Responses
Sex  A  Not-A
  F  10     20
  M  15     35

> h.tbl=as.table(h.mtrx)> h.mtrx=matrix(c(25,45,5,5),nrow=2)> 
> h.tbl=as.table(h.mtrx)> 
> dimnames(h.tbl)=list(Sex=c("F","M"),Responses=c("B","Not-B"))> h.tbl   
> Responses
Sex  B Not-B
  F 25     5
  M 45     5


If I then preform the normal Chi-square test on each of the two tables
(chisq.test()) and then sum up the results, I get the answer I want.
Clearly this is cumbersome, which is why I do it in Excel at the moment (I
know shame on me). However, I really want to take advantage of the mosaic
function in vcd. So what I have to do at the moment is create the tables
above and use abind() (package:abind) to bring my two matrices together to
form a multidimensional matrix. Example:

> gh.abind = abind(g.mtrx,h.mtrx,along=3)> 
> dimnames(gh.abind)=list(Sex=c("F","M"),Responses=c("Yes","No"),Factors=c("A","B"))>
>  gh.abind, , Factors = A

   Responses
Sex Yes No
  F  10 20
  M  15 35

, , Factors = B

   Responses
Sex Yes No
  F  25  5
  M  45  5

Now I can use the simple mosaic function to plot the combined matrix

> mosaic(gh.abind)

So that's it. I don't use any pearson-r shading in mosaic since I
don't think it would be appropriate to try and model my weird multiple
response tables (at the moment), but what I will do is look at the
odds-ratio table and then manually colour the mosaic cells with high
odds-ratios (greater than 2).

I am literally having to type all this by hand into R, and as you can
imagine, it gets cumbersome with large multi column tables (which I
have). Does any body have any thoughts on my approach of using mosaic
for this sort of data? And if so, any insight on how I can be a bit
slicker with my R code?

All help is appreciated and I hope that this question wasn't too long
to read through.

All the best,
Marcos




-- 
PhD Engineering Candidate
University of Cambridge
Department of Engineering
Centre for Sustainable Development
mp...@cam.ac.uk <mp...@cam.ac.uk>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to