Hi r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43:
> > Dear Michael, > > Thanks a lot for your reply and for your help.I was struggling so much but > your suggestion showed me a path to the solution of my problem.I have > tried your code on my data frame step wise and it looks fine to me.But > when i tried chi square test- > > res=chisq.test(y1[id],p=y2[id],rescale.p=T) > > Chi-squared test for given probabilities > > data: y1[id] > X-squared = NaN, df = 19997, p-value = NA > > Warning message: > In chisq.test(y1[id], p = y2[id], rescale.p = T) : > Chi-squared approximation may be incorrect Check what Y1[id] is. Split Yn to lists l1<-split(Y1[id], rep(1:6, each=2)) l2<-split(Y2[id], rep(1:6, each=2)) do mapply on those list. But the result is rather silly as Michael pointed out. mapply(chisq.test, l1, l2, SIMPLIFY=F) or to get only p values lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),"[", 3) Regards Petr > > It is not giving p value.Then i checked observed and expected values,it is > taking all numbers under consideration.but as i mentioned earlier i want p > value for each row and therefore degree of freedom will be 1. example- > > I have a data frame with 8 columns- > V1 V2 V3 V4 W1 W2 W3 W4 > 1 0 84 22 10 0 84 0 0 > 2 35 84 0 0 22 84 0 0 > 3 0 0 0 48 0 0 0 48 > 4 0 48 0 0 0 48 0 0 > 5 0 84 0 0 0 84 0 0 > 6 0 0 0 48 0 0 0 48 > > example for first row is- > > first two largest values are 84(in V2) and 22 (in V3).so these are > considered as observed values.Now if the largest values are in V2 and > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I > know for chi square test values should not be 0 but we will ignore the warning. > > now it should generate p value for next row taking 35 and 84 (v1 and v2) > as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi > square test for all 6 rows and will generate 6 p values.My data frame has > lot of rows(approx. 9999). > > Can you please help me with this. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: R. Michael Weylandt [michael.weyla...@gmail.com] > Sent: Wednesday, August 17, 2011 7:11 PM > To: Bansal, Vikas > Cc: r-help@r-project.org > Subject: Re: [R] Chi square test on data frame > > I think everything below is right, but it's all a little helter-skelter so > take it with a grain of salt: > > First things first, make your data with dput() for the list. > > Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, > 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, > 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L > ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1", > "V2", "V3", "V4", "W1", "W2", "W3", "W4"))) > > Now, > > Y1 = Y[,1:4] > Y2 = Y[,-(1:4)] > > id = apply(Y1,1,order,decreasing=T)[1:2,] > # This has the columns you want in each row, but it's not directly > appropriate for subsetting > # Specifically, the problem is that the row information is implicit in > where the col index is in id > # We directly extract and force into a 2-col vector that gives rows and > columns for each data point > id = cbind(as.vector(col(id)),as.vector(id)) > > Now you can take > > Y1[id] as the observed values and Y2[id] as the expected. > > But, to be honest, it sounds like you have more problems in using a chi-sq > test than anything else. Beyond all the zeros, you should note that you > always have #obs >= #expected because Y1>= Y2. I'll leave that up to you though. > > Hope this helps and please make sure you can take my code apart piece by > piece to understand it: there's some odd data manipulation that takes > advantage of R's way of coercing matrices to vectors and if your actual > data isn't like the provided example, you may have to modify. > > Michael Weylandt > > On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.ban...@kcl.ac.uk< > mailto:vikas.ban...@kcl.ac.uk>> wrote: > Is there anyone who can help me with chi square test on data frame.I am > struggling from last 2 days.I will be very thankful to you. > > Dear all, > > I have been working on this problem from so many hours but did not find > any solution. > I have a data frame with 8 columns- > V1 V2 V3 V4 W1 W2 W3 W4 > 1 0 84 22 10 0 84 0 0 > 2 35 84 0 0 22 84 0 0 > 3 0 0 0 48 0 0 0 48 > 4 0 48 0 0 0 48 0 0 > 5 0 84 0 0 0 84 0 0 > 6 0 0 0 48 0 0 0 48 > > from first four columns, for each row I have to take two largest values. > and these two values will be considered as observed values.And from last > four column we will get the expected values.So i have to perform chi > square test for each row to get p values. > > example for first row is- > > first two largest values are 84(in V2) and 22 (in V3).so these are > considered as observed values.Now if the largest values are in V2 and > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I > know for chi square test values should not be 0 but we will ignore the warning. > Now as we have observed value as well as expected we have to perform chi > square test to get p values for each row in a new column. > > > So far I was working as returning the index for two largest value with- > sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)] > but it does not accept data frame. > > Can you please give some idea how to do this,because it is very tricky and > after studying a lot, I am not able to perform.Please help. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.