Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Tal Galili Fri, 24 Jul 2009 16:09:26 -0700

Hello dear Robert and Daniel,
I read your replies with much interest, thank you very much for them!


Dear Daniel,
Your approach is, as Robert said, "whopped me alongside the head" as well.
So much so that I fear it will be much more then my current level of
understanding can take for the project I have at hand.
For the time being (and as much as it saddens me) I will not pursue this
as rigorously as it deserves, and hope to look at it again after I will do
my homework of studying mixed model do the degree of understanding your
reply.
I will thank you for mentioning the use of Monte-Carlo, since although I had
the chance of working with it on other situations, somehow it didn't occur
to me to just have a go at it for the question I brought up in this e-mail.
So thank you again for that, and for the very interesting response.

Dear Robert,
With response to your comments in the second e-mail, here are a few
questions:
1) Do you think I could do LRT (based on ML) by using prop.test on the
marginals of the tables?
I have to admit I am not fully sure how this is different then using the
fisher.test on the marginals of the tables (I have the feeling this is
something very basic of which I am showing my ignorance about, so I
apologize for that).
Regarding the resampling alternative, that I am not sure how to try
and do. Could
you give me a a pointer or two on how to try and use the survival package
for my situation? (I don't have any experience using that package for this
purpose, and some general direction would be of great help).
2) I had the feeling that the mcnemar test would be a poor choice, which is
what led me to writing to the help group :)
3) Regarding the assumptions of the mixed models, that is in part what
"scares" me. And since I (for now) don't have experience using these models,
I am not so keen on taking them on just yet.
4) I didn't think about using Bayesian methods for it. I admit I never tried
them in R. If you have a name or two to drop me, I would love to go through
it, at least for the educational value of it.
5) Doing this was exactly my thought, but I wasn't sure how much this would
be valid. It sounds from what you wrote that it might be. So what I will do
(assuming I will receive a bit of help on some of the tests you brought up)
is to try and compare this method in section "5)" you wrote to the method(s)
in "1)"  (using Monte-Carlo) , so to see which one might work better.
6) I am sorry to say that the data I am analyzing wouldn't give me concordant
pairs. But regarding your question - I would love to be able to find a
solution for "how to obtain a valid confidence interval for the ratio of
proportions for the two tests". But don't know where to even begin on that
one. Any insights would be of help.


Both Robert and Daniel, I am amazed at the depth and length of your
responses and would like to thank you again for your generosity.
With much regards,
Tal








On Fri, Jul 24, 2009 at 11:43 PM, Robert A LaBudde <r...@lcfltd.com> wrote:

> Daniel's response about using a mixed model "whopped me alongside the
> head", so now I'm thinking more clearly. Here are a few more comments:
>
> 1. The LRT based on ML or the resampling alternative is probably the most
> powerful among reasonable algorithms to do a test on your problem. You might
> even get a survival package to do it for you.
>
> 2. The McNemar test with casewise deletion should work well enough, so long
> as you don't lose too much data (e.g., less than 50%). Unfortunately, this
> seems to be your situation.
>
> 3. The mixed model (or, more properly, the generalized mixed model)
> approach will involve a number of new assumptions not necessary in more
> direct methods.
>
> 4. A Bayesian approach would be better than 1) above if you've got some
> useful prior information.
>
> 5. If you have a lot of missing data (> 50%), as you appear to have from
> your example,   you generally will get more power than the McNemar test from
> just ignoring the matching on subjects, treating it as unpaired data (2
> independent samples). Not as good as method 1), but the high missing rate
> would justify the independence assumption, as it would reduce severely
> whatever correlation is present on the same vs. different subjects. The
> extra data may be worth more than the pairing, particularly if there is a
> large percentage of concordant pairs in the McNemar test of symmetry.
>
> 6. Finally, why are you even doing a test? Tests don't provide physically
> useful information, only that your study has enough subjects to
> statistically detect an effect or not. Wouldn't you be better off asking how
> to obtain a valid confidence interval for the difference or ratio of
> proportions for the two tests? The concordant pairs would then be useful.
>
>
>
> At 03:17 PM 7/24/2009, Daniel Malter wrote:
>
>>
>> Hi Tal, you can use the lme4 library and use a random effects model. I
>> will
>> walk you through the example below (though, generally you should ask a
>> statistician at your school about this). At the very end, I will include a
>> loop for Monte-Carlo simulations that shows that the estimation of the
>> fixed
>> effects is unbiased, which assures you that you are estimating the
>> "correct"
>> coefficient. In addition, as Robert says, you should remove all subjects
>> for
>> which both observations are missing.
>>
>> ##First, we simulate some data.
>> ##Note how the data structure matches the data structure you have:
>>
>> id=rep(1:100,each=2)
>> #the subject ID
>>
>> test=rep(0:1,100)
>> # first or second measurement/test (your variable of interest)
>>
>> randeff=rep(rnorm(100,0,1),each=2)
>> # a random effect for each subject
>>
>> linear.predictor=randeff+2*test
>> #the linear predictor
>> #note the coefficient on the fixed effect of "test" is 2
>>
>> probablty=exp(linear.predictor)/(1+exp(linear.predictor))
>> #compute the probability using the linear predictor
>>
>> y=rbinom(200,1,probablty)
>> #draw 0/1 dependent variables
>> #with probability according to the variable probablty
>>
>> miss=sample(1:200,20,replace=FALSE)
>> #some indices for missing variables
>>
>> y[miss]=NA
>> #replace the dependent variable with NAs for the "miss"ing indices
>>
>>
>> ##Some additional data preparation
>>
>> id=factor(id)
>> #make sure id is coded as a factor/dummy
>>
>> mydata=data.frame(y,test,id)
>> #bind the data you need for estimation in a dataframe
>>
>>
>> ##Run the analysis
>>
>> library(lme4)
>> reg=lmer(y~test+(1|id),binomial,data=mydata)
>> summary(reg)
>>
>>
>>
>> ##And here are the Monte-Carlo simulations
>>
>> library(lme4)
>>
>> bootstraps=200
>> estim=matrix(0,nrow=bootstraps,ncol=2)
>>
>> for(i in 1:bootstraps){
>> id=rep(1:100,each=2)
>> test=rep(0:1,100)
>> randeff=rep(rnorm(100,0,1),each=2)
>> linear.predictor=randeff+2*test
>> probablty=exp(linear.predictor)/(1+exp(linear.predictor))
>> y=rbinom(200,1,probablty)
>> miss=sample(1:200,20,replace=FALSE)
>> y[miss]=NA
>> id=factor(id)
>> mydata=data.frame(y,test,id)
>> reg=lmer(y~test+(1|id),binomial,data=mydata)
>> estim[i,]=...@fixef
>> }
>>
>> mean(estim[,1])
>> #mean estimate of the intercept from 200 MC-simulations
>> #should be close to zero (no intercept in the linear.predictor)
>>
>> mean(estim[,2])
>> #mean estimate of the 'test' fixed effect from 200 MC-simulations
>> #should be close to 2 (as in the linear.predictor)
>>
>> #plot estimated coefficients from 200 MC simulations
>> par(mfcol=c(1,2))
>> hist(estim[,1],main="Intercept")
>> hist(estim[,2],main="Effect of variable 'test'")
>>
>> HTH,
>> Daniel
>>
>> -------------------------
>> cuncta stricte discussurus
>> -------------------------
>>
>> -----Ursprüngliche Nachricht-----
>> Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> Im
>> Auftrag von Tal Galili
>> Gesendet: Friday, July 24, 2009 1:23 PM
>> An: r-help@r-project.org
>> Betreff: [R] How to test frequency independence (in a 2 by 2 table)
>> withmany
>> missing values
>>
>> Hello dear R help group.
>>
>> My question is statistical and not R specific, yet I hope some of you
>> might
>> be willing to help.
>>
>> *Experiment settings*:  We have a list of subjects. each of them went
>> through two tests with the answer to each can be either 0 or 1.
>> *Goal*: We want to know if the two experiments yielded different results
>> in
>> the subjects answers.
>> *Statistical test (and why it won't work)*: Naturally we would turn to
>> performing a mcnemar test. But here is the twist: we have missing values
>> in
>> our data.
>> For our purpose, let's assume the missingnes is completely at random, and
>> we
>> also have no data to use for imputation. Also, we have much more missing
>> data for experiment number 2 then in experiment number 1.
>>
>> So the question is, under these settings, how do we test for experiment
>> effect on the outcome?
>>
>> So far I have thought of two options:
>> 1) To perform the test only for subjects that has both values. But they
>> are
>> too scarce and will yield low power.
>> 2) To treat the data as independent and do a pearson's chi square test
>> (well, an exact fisher test that is) on all the non-missing data that we
>> have. The problem with this is that our data is not fully independent
>> (which
>> is a prerequisite to chi test, if I understood it correctly). So I am not
>> sure if that is a valid procedure or not.
>>
>> Any insights will be warmly welcomed.
>>
>>
>> p.s: here is an example code producing this scenario.
>>
>> set.seed(102)
>>
>> x1 <- rbinom(100, 1, .5)
>> x2 <- rbinom(100, 1, .3)
>>
>> X <- data.frame(x1,x2)
>> tX <- table(X)
>> margin.table(tX,1)
>> margin.table(tX,2)
>> mcnemar.test(tX)
>>
>> put.missings <- function(x.vector, na.percent) { turn.na <-
>> rbinom(length(x.vector), 1, na.percent)  x.vector[turn.na == 1] <- NA
>> return(x.vector)
>> }
>>
>>
>> x1.m <- put.missings(x1, .3)
>> x2.m <- put.missings(x2, .6)
>>
>> tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m)))
>> fisher.test(tX.m)
>>
>>
>>
>>
>> With regards,
>> Tal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ----------------------------------------------
>>
>>
>> My contact information:
>> Tal Galili
>> Phone number: 972-50-3373767
>> FaceBook: Tal Galili
>> My Blogs:
>> http://www.r-statistics.com/
>> http://www.talgalili.com
>> http://www.biostatistics.co.il
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
> ================================================================
>
>


-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Reply via email to