Re: [R] chisq.test
If you wanted to do a t.test res1<-do.call(cbind,lapply(seq_len(nrow(m)),function(i) do.call(rbind,lapply(split(rbind(m[i,-1],n),1:nrow(rbind(m[i,-1],n))), function(x) {x1<- rbind(x,m[i,-1]); t.test(x1[1,],x1[2,])$p.value} res2<-do.call(cbind,lapply(seq_len(ncol(res1)),function(i) c(c(tail(res1[seq(1,i,1),i],-1),1),res1[-c(1:i),i]))) attr(res2,"dimnames")<-NULL res2 # [,1] [,2] [,3] [,4] #[1,] 1.000 1.000 1.000 0.6027881 #[2,] 1.000 1.000 1.000 0.5790103 #[3,] 1.000 1.000 1.000 1.000 #[4,] 0.6027881 0.6027881 0.5637881 1.000 #here, the first column is testing a2, against a2, a,c,t, second c2, against t, c2, a,c, third c3 against c,t,c3,a, and fourth t2 against a,c,t, and t2. A.K. From: Vera Costa To: arun Sent: Tuesday, March 5, 2013 9:38 AM Subject: Re: chisq.test ok, thank you. I will test. Thank you very much >> >>From: Vera Costa >>To: arun >>Sent: Tuesday, March 5, 2013 8:23 AM >>Subject: Re: chisq.test >> >> >> >>Sorry if my explanation isn't good... >> >>I have this tables: >> >>m<-structure(list(id = structure(1:4, .Label = c("a2", "c2", "c3", >>"t2"), class = "factor"), `1` = c(0L, 0L, 0L, 1L), `2` = c(8L, >>8L, 6L, 10L), `3` = c(2L, 2L, 4L, 5L)), .Names = c("id", "1", >>"2", "3"), row.names = c("a2", "c2", "c3", "t2"), class = "data.frame") >> >> >>n<-structure(c(0, 0, 1, 8, 7, 10, 2, 3, 5), .Dim = c(3L, 3L), .Dimnames = >>list( >> c("a", "c", "t"), c("1", "2", "3"))) >> >>and I need to apply a chisq.test between all. I need to compare a2 to a,c an >>t. After compare c2 with a,c,and t.After c3 with a,c,and t >> >>And the output will be some like this: >> >> a b c >>a2 xxx xxx >>c2 xxx xxx >>c3 xxx xxx >>t2 xxx xxx >> >>where is the p-values. >> >> >>It isn't possible? >> >>Vera >> >> >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
Dear all! Thanks for clarification. OV To: Rolf Turner Sent: Wednesday, June 27, 2012 1:33 PM Subject: Re: [R] chisq.test Hi Rolf, Thanks for spotting the mistake. A.K. - Original Message - From: Rolf Turner .org> Sent: Wednesday, June 27, 2012 12:58 AM Subject: Re: [R] chisq.test On 27/06/12 08:54, arun wrote: > > Hi, > > The error is due to less than 5 observations in some cells. NO, NO, NO It's not the observations that matter, it is the ***EXPECTED COUNTS***. These must all be at least 5 in order for the null distribution of the test statistic to be adequately approximated by a chi-squared distribution. cheers, Rolf Turner > > You can try, > fisher.test(tabele) > Fisher's Exact Test for Count Data > > data: tabele > p-value = 0.0998 > alternative hypothesis: two.sided > > A.K. > > > > - Original Message - > To: "r-help@r-project.org" > Cc: > Sent: Tuesday, June 26, 2012 2:27 PM > Subject: [R] chisq.test > > Dear list! > > I would like to calculate "chisq.test" on simple data set with 70 > observations, but the output is ''Warning message:'' > > Warning message: > In chisq.test(tabele) : Chi-squared approximation may be incorrect > > > Here is an example: > > tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = >TRUE) > dimnames(tabela) <- list( > "SEX" = c("M","F"), > "HAIR" = c("Brown", "Black", "Red", "Blonde")) > addmargins(tabele) > prop.table(tabele) > chisq.test(tabele) > Please, give me an advice / suggestion / recommendation. > > Thanks a lot to all, OV > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On 2012-06-26 23:02, John wrote: On Wed, 27 Jun 2012 16:58:29 +1200 Rolf Turner wrote: On 27/06/12 08:54, arun wrote: Hi, The error is due to less than 5 observations in some cells. NO, NO, NO It's not the observations that matter, it is the ***EXPECTED COUNTS***. These must all be at least 5 in order for the null distribution of the test statistic to be adequately approximated by a chi-squared distribution. cheers, Rolf Turner Pretty sure the point was that in a situation where the expected counts are too low for a reliable chi-square, that an alternate test such as the nonparametric Fisher's Exact Test may be the way to go, especially if there isn't nay more data to get. That way you don't have to worry about expected counts. JDougherty That may well be; nevertheless, the post included the statement Rolf quotes: "... less than 5 _observations_ in some cells" (my emphasis). And Rolf's point is quite correct - it's the _expected_ counts that the approximation cares about. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On Wed, 27 Jun 2012 16:58:29 +1200 Rolf Turner wrote: > On 27/06/12 08:54, arun wrote: > > > > Hi, > > > > The error is due to less than 5 observations in some cells. > > NO, NO, NO It's not the observations that matter, it is > the ***EXPECTED COUNTS***. These must all be at least > 5 in order for the null distribution of the test statistic to be > adequately approximated by a chi-squared distribution. > > cheers, > > Rolf Turner Pretty sure the point was that in a situation where the expected counts are too low for a reliable chi-square, that an alternate test such as the nonparametric Fisher's Exact Test may be the way to go, especially if there isn't nay more data to get. That way you don't have to worry about expected counts. JDougherty __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On 27/06/12 08:54, arun wrote: Hi, The error is due to less than 5 observations in some cells. NO, NO, NO It's not the observations that matter, it is the ***EXPECTED COUNTS***. These must all be at least 5 in order for the null distribution of the test statistic to be adequately approximated by a chi-squared distribution. cheers, Rolf Turner You can try, fisher.test(tabele) Fisher's Exact Test for Count Data data: tabele p-value = 0.0998 alternative hypothesis: two.sided A.K. - Original Message - From: Omphalodes Verna To: "r-help@r-project.org" Cc: Sent: Tuesday, June 26, 2012 2:27 PM Subject: [R] chisq.test Dear list! I would like to calculate "chisq.test" on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) <- list( "SEX" = c("M","F"), "HAIR" = c("Brown", "Black", "Red", "Blonde")) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
Hi, The error is due to less than 5 observations in some cells. You can try, fisher.test(tabele) Fisher's Exact Test for Count Data data: tabele p-value = 0.0998 alternative hypothesis: two.sided A.K. - Original Message - From: Omphalodes Verna To: "r-help@r-project.org" Cc: Sent: Tuesday, June 26, 2012 2:27 PM Subject: [R] chisq.test Dear list! I would like to calculate "chisq.test" on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) <- list( "SEX" = c("M","F"), "HAIR" = c("Brown", "Black", "Red", "Blonde")) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On Jun 26, 2012, at 2:27 PM, Omphalodes Verna wrote: Dear list! I would like to calculate "chisq.test" on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) <- list( "SEX" = c("M","F"), "HAIR" = c("Brown", "Black", "Red", "Blonde")) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Read any introductory stats book regarding small cell sizes: [,1] [,2] [,3] [,4] [1,] 11335 [2,]3 186 21 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
The warning means that you have many cells with expected values less than 5 (4 of 8 cells in this case) so that the chi square estimate may be inflated. The good news is that the probability of the inflated chi square is .0978 which you probably would not consider to be significant anyway. If you want to get a simulated p value using Monte Carlo simulation (see the references in the manual page for chisq.test), just change the call to chisq.test(tabele, simulate.p.value=TRUE, B=2000) When I run this five times, I get probability estimates ranging from .09795 to .1089. Alternatively, get more data. -- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Omphalodes Verna > Sent: Tuesday, June 26, 2012 1:28 PM > To: r-help@r-project.org > Subject: [R] chisq.test > > Dear list! > > I would like to calculate "chisq.test" on simple data set with 70 > observations, but the output is ''Warning message:'' > > Warning message: > In chisq.test(tabele) : Chi-squared approximation may be incorrect > > > Here is an example: > > tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow > = TRUE) > dimnames(tabela) <- list( > "SEX" = c("M","F"), > "HAIR" = c("Brown", "Black", "Red", "Blonde")) > addmargins(tabele) > prop.table(tabele) > chisq.test(tabele) > Please, give me an advice / suggestion / recommendation. > > Thanks a lot to all, OV > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On 2012-06-26 11:27, Omphalodes Verna wrote: Dear list! I would like to calculate "chisq.test" on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele<- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela)<- list( "SEX" = c("M","F"), "HAIR" = c("Brown", "Black", "Red", "Blonde")) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Do this: ct <- chisq.test(tabele) ct$expected If that does not give you a sufficient hint, then you need to review the assumptions underlying the chisquare test. Peter Ehlers Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test
Dear list! I would like to calculate "chisq.test" on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele <- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) <- list( "SEX" = c("M","F"), "HAIR" = c("Brown", "Black", "Red", "Blonde")) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test vs manual calculation - why are different results produced?
On Feb 20, 2012, at 5:57 AM, Louise Mair wrote: Hello, I am trying to fit gamma, negative exponential and inverse power functions to a dataset, and then test whether the fit of each curve is good. To do this I have been advised to calculate predicted values for bins of data (I have grouped a continuous range of distances into 1km bins), and then apply a chi-squared test. Example: data <- data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1), predicted=c(28, 18, 10, 5 ,3, 1, 1)) There's an error with that code. chisq.test(data$observed, data$predicted) Which gives: Pearson's Chi-squared test data: data$observed and data$predicted X-squared = 35, df = 25, p-value = 0.0882 Warning message: In chisq.test(data$observed, data$predicted) : Chi-squared approximation may be incorrect I understand this is due to having observed/predicted values of less than five, however I am interested to know firstly why R uses such a large number of degrees of freedom (when by my understanding there should only be 4 df), and secondly whether using the following manual calculation is therefore inappropriate - Read the help page Details section end of second paragraph. You probably wanted: chisq.test(cbind(data$observed, data$predicted)) X2 <- sum(((data$observed - data$predicted)^2)/data$predicted) 1-pchisq(X2,4) [1] 0.04114223 If chi-squared is unsuitable, what other test can I use to determine whether my observed and predicted data come from the same distribution? The frequently recommended fisher's test doesn't seem to be any more appropriate as it requires values of greater than 5 for contingency tables larger than 2 x 2. Thanks for your help. Louise [[alternative HTML version deleted]] Plain text is requested as the mail format. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test vs manual calculation - why are different results produced?
Hello, I am trying to fit gamma, negative exponential and inverse power functions to a dataset, and then test whether the fit of each curve is good. To do this I have been advised to calculate predicted values for bins of data (I have grouped a continuous range of distances into 1km bins), and then apply a chi-squared test. Example: > data <- data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1), predicted=c(28, 18, 10, 5 ,3, 1, 1)) > chisq.test(data$observed, data$predicted) Which gives: Pearson's Chi-squared test data: data$observed and data$predicted X-squared = 35, df = 25, p-value = 0.0882 Warning message: In chisq.test(data$observed, data$predicted) : Chi-squared approximation may be incorrect I understand this is due to having observed/predicted values of less than five, however I am interested to know firstly why R uses such a large number of degrees of freedom (when by my understanding there should only be 4 df), and secondly whether using the following manual calculation is therefore inappropriate - > X2 <- sum(((data$observed - data$predicted)^2)/data$predicted) > 1-pchisq(X2,4) [1] 0.04114223 If chi-squared is unsuitable, what other test can I use to determine whether my observed and predicted data come from the same distribution? The frequently recommended fisher's test doesn't seem to be any more appropriate as it requires values of greater than 5 for contingency tables larger than 2 x 2. Thanks for your help. Louise [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 20, 2011, at 12:57 PM, peter dalgaard wrote: On Aug 20, 2011, at 18:04 , Stephen Davies wrote: As for "$stdres," that would be wonderful, but as you can see from the above list of attributes, it's not one of the 8 returned. What am I missing? An upgrade, most likely. Whoosh. Sometimes I am simply clueless. I didn't notice that 'stdres' was missing from the names in Stephen's output. Laura Thompson has a very nice R/S accompaniment to Agresti's "Categorical Data Analysis" text and she shows how to adjust the Pearson residuals to make them "standardized". What follows is directly from pages 37-38 of her work: #--# resid.pear <- residuals(fit.glm, type = "pearson") Note that the sum of the squared Pearson residuals equals the Pearson chi-squared statistic: sum(resid.pear^2) [1] 69.11429 To get the standardized residuals, just modify resid.pear according to the formula on p. 81 of Agresti. ni<-rowSums(table.3.2.array) # row sums nj<-colSums(table.3.2.array) # column sums n<-sum(table.3.2.array) # total sample size resid.pear.mat<-matrix(resid.pear, nc=3, byrow=T, dimnames=list(c(" "Bachelor or Grad"),c("Fund", "Mod", "Lib"))) n*resid.pear.mat/sqrt(outer(n-ni,n-nj,"*") ) # standardized Pearson residuals FundMod Lib You can also look at the code (once you upgrade) and the method in R is quite similar, although the R codes calcualtes the stdres values separately rather than adjusting the Pearson residuals -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 20, 2011, at 18:04 , Stephen Davies wrote: > As for "$stdres," that would be wonderful, but > as you can see from the above list of attributes, it's not one of the 8 > returned. What am I missing? An upgrade, most likely. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com "Døden skal tape!" --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 20, 2011, at 12:04 PM, Stephen Davies wrote: I'm using chisq.test() on a matrix of categorical data, and I see that the "residuals" attribute of the returned object will give me the Pearson residuals. Actually they are not an attribute in the R sense, but rather a list value. Oh. I was just going by: attributes(my.chisq.test) $names [1] "statistic" "parameter" "p.value" "method""data.name" "observed" [7] "expected" "residuals" $class [1] "htest" which I interpreted as "this object has 8 attributes, called 'statistic', 'parameter', ..., 'residuals'." Is that not the right terminology? The names attribute let's you know what characters to use if you want to access values in a list. Unless you are doing programming attributes is not a particular useful function. It is much more common to access the names attribute with the `names` function: > names(Xsq) [1] "statistic" "parameter" "p.value" "method""data.name" "observed" "expected" [8] "residuals" "stdres" So "stdres" is not an attribute but rather one value in a particular attribute called "names". You would get (much) more information by using str on the htest object as below: > str(Xsq) List of 9 $ statistic: Named num 30.1 ..- attr(*, "names")= chr "X-squared" $ parameter: Named num 2 ..- attr(*, "names")= chr "df" $ p.value : num 2.95e-07 $ method : chr "Pearson's Chi-squared test" $ data.name: chr "M" $ observed : table [1:2, 1:3] 762 484 327 239 468 477 ..- attr(*, "dimnames")=List of 2 .. ..$ gender: chr [1:2] "M" "F" .. ..$ party : chr [1:3] "Democrat" "Independent" "Republican" $ expected : num [1:2, 1:3] 704 542 320 246 534 ... ..- attr(*, "dimnames")=List of 2 .. ..$ gender: chr [1:2] "M" "F" .. ..$ party : chr [1:3] "Democrat" "Independent" "Republican" $ residuals: table [1:2, 1:3] 2.199 -2.505 0.411 -0.469 -2.843 ... ..- attr(*, "dimnames")=List of 2 .. ..$ gender: chr [1:2] "M" "F" .. ..$ party : chr [1:3] "Democrat" "Independent" "Republican" $ stdres : table [1:2, 1:3] 4.502 -4.502 0.699 -0.699 -5.316 ... ..- attr(*, "dimnames")=List of 2 .. ..$ gender: chr [1:2] "M" "F" .. ..$ party : chr [1:3] "Democrat" "Independent" "Republican" - attr(*, "class")= chr "htest" Now you can see that the values in the stdres object are really a list element and are in a table with particular row and column names. You get that object one of two ways. you ca use the "$" method as Dalgaard suggested or you can use "[[" with the name of the object: Xsq[["stdres"]] That's cool. However, what I'd really like is the standardized (adjusted) Pearson residuals, which have a N(0,1) distribution. Is there a way to do that in R (other than by me programming it myself?) ?scale chisq.test(...)$stdres, more likely. "scale" is not what I want. As for "$stdres," that would be wonderful, but as you can see from the above list of attributes, it's not one of the 8 returned. What am I missing? David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
> >>> I'm using chisq.test() on a matrix of categorical data, and I see > >>> that the > >>> "residuals" attribute of the returned object will give me the > >>> Pearson residuals. > > Actually they are not an attribute in the R sense, but rather a list > value. Oh. I was just going by: > attributes(my.chisq.test) $names [1] "statistic" "parameter" "p.value" "method""data.name" "observed" [7] "expected" "residuals" $class [1] "htest" which I interpreted as "this object has 8 attributes, called 'statistic', 'parameter', ..., 'residuals'." Is that not the right terminology? > >>> That's cool. However, what I'd really like is the standardized > >>> (adjusted) > >>> Pearson residuals, which have a N(0,1) distribution. Is there a > >>> way to do that > >>> in R (other than by me programming it myself?) > >> > >> ?scale > > > > chisq.test(...)$stdres, more likely. "scale" is not what I want. As for "$stdres," that would be wonderful, but as you can see from the above list of attributes, it's not one of the 8 returned. What am I missing? - Stephen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 20, 2011, at 3:43 AM, peter dalgaard wrote: On Aug 19, 2011, at 20:40 , David Winsemius wrote: On Aug 19, 2011, at 1:28 PM, Stephen Davies wrote: I'm using chisq.test() on a matrix of categorical data, and I see that the "residuals" attribute of the returned object will give me the Pearson residuals. Actually they are not an attribute in the R sense, but rather a list value. That's cool. However, what I'd really like is the standardized (adjusted) Pearson residuals, which have a N(0,1) distribution. Is there a way to do that in R (other than by me programming it myself?) ?scale chisq.test(...)$stdres, more likely. Agree that does have a much greater chance of keeping the questioner in the mainstream of statistics terminology and is most likely what he was looking for, but do not think the result will in general have an N(1,0) distribution. I believe the correct statement is that standardized residuals would (in the statistical "asymptotic" sense) have an N(1,0) distribution if and when the null hypothesis of marginal homogeneity were true, but should not be N(1,0) in any case when an alternate hypothesis holds. My error was in taking the questioner's request at face value. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 19, 2011, at 20:40 , David Winsemius wrote: > > On Aug 19, 2011, at 1:28 PM, Stephen Davies wrote: > >> I'm using chisq.test() on a matrix of categorical data, and I see that the >> "residuals" attribute of the returned object will give me the Pearson >> residuals. >> That's cool. However, what I'd really like is the standardized (adjusted) >> Pearson residuals, which have a N(0,1) distribution. Is there a way to do >> that >> in R (other than by me programming it myself?) > > ?scale chisq.test(...)$stdres, more likely. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com "Døden skal tape!" --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test(): standardized (adjusted) Pearson residuals
On Aug 19, 2011, at 1:28 PM, Stephen Davies wrote: I'm using chisq.test() on a matrix of categorical data, and I see that the "residuals" attribute of the returned object will give me the Pearson residuals. That's cool. However, what I'd really like is the standardized (adjusted) Pearson residuals, which have a N(0,1) distribution. Is there a way to do that in R (other than by me programming it myself?) ?scale -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test(): standardized (adjusted) Pearson residuals
I'm using chisq.test() on a matrix of categorical data, and I see that the "residuals" attribute of the returned object will give me the Pearson residuals. That's cool. However, what I'd really like is the standardized (adjusted) Pearson residuals, which have a N(0,1) distribution. Is there a way to do that in R (other than by me programming it myself?) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test and cbind
Hi, This is a mixed conceptual/methodological issue. I have 3 years and 2 localities, I want to compare the Sex Ratio series between the two localities. I can do it year by year, for instance: > SR2010<-data.frame(FAO=c(96,52),JUNC=c(60,42)) > SR2010 FAO JUNC 1 96 60 2 52 42 > chisq.test(SR2010) Pearson's Chi-squared test with Yates' continuity correction data: SR2010 X-squared = 0.6995, df = 1, p-value = 0.4030 Or perhaps I could be interested in testing if there is any difference in SR along my time series (just three years), is that correct?: > data1<-data.frame(Mfao=c(173,96,96),Ffao=c(136,62,52),Mjunc=c(7,26,60),Fjunc=c(5,23,42)) > data1 Mfao Ffao Mjunc Fjunc 1 173 136 7 5 2 96 622623 3 96 526042 > attach(data1) > chisq.test(cbind(Mfao,Ffao),cbind(Mjunc,Fjunc)) Pearson's Chi-squared test data: cbind(Mfao, Ffao) X-squared = 3.4443, df = 2, p-value = 0.1787 Thanks in advance for any response [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test on samples of different lengths
On Aug 24, 2010, at 4:12 PM, Marino Taussig De Bodonia, Agnese wrote: > Hello, > > I am trying to see whether there has been a significant difference in whether > people experienced damages from wildlife in two different years. I therefore > have two columns: > > year 1: > yes > no > no > no > yes > yes > no > > year 2: > no > yes > no > yes > > I wanted to do a chisq.test, but if I enter it this way: > > chisq.test(year1, year2) > > I get the error saying the columns are two different lengths. So then I tried > doing: > > damages<-matrix(c(3,4, 2,2), ncol=2, dimnames=list(answer=c("yes", "no"), > year=c("year1", year2))) > chisq.test(damages) > > Does that make sense? Should I maybe be doing a different test instead? The procedure is fine as such. A more automated way would be to mat <- cbind(table(year1),table(year2)) chisq.test(mat) (some may prefer rbind(...), but the chi-square won't care) The issue with the two-variable format is that it expects cross-classifying factors of the same individuals, not two independent groups. So you might do answer <- c(year1,year2) year <- rep(1:2, length(year1),length(year2)) table(answer, year) # just for enlightenment chisq.test(answer, year) Another matter is that you are below the usual rule of thumb for chi-square: expected >5 obs in all 4 cells, which is obviously not going to happen with 10 observations in total. fisher.test is an option, but you need pretty extreme configurations to obtain significance. (BTW, all of the above assumes that there are no empty cells. Caveat emptor.) > > Any help would be appreciated, thank you. > > Agnese > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test on samples of different lengths
On Aug 24, 2010, at 10:12 AM, Marino Taussig De Bodonia, Agnese wrote: Hello, I am trying to see whether there has been a significant difference in whether people experienced damages from wildlife in two different years. I therefore have two columns: year 1: yes no no no yes yes no year 2: no yes no yes I wanted to do a chisq.test, but if I enter it this way: chisq.test(year1, year2) I get the error saying the columns are two different lengths. So then I tried doing: damages<-matrix(c(3,4, 2,2), ncol=2, dimnames=list(answer=c("yes", "no"), year=c("year1", year2))) chisq.test(damages) Which should throw an error because year2 is not quoted. Consider using prop.test: ?proptest So your matrix is the transpose of what is needed for prop.test, at least as I read the docs: > damages<-matrix(c(3,4, 2,2), ncol=2, byrow=TRUE, dimnames=list(year=c("year1", "year2"),success=c("yes", "no"))) > damages success yearyes no year1 3 4 year2 2 2 > prop.test(damages) 2-sample test for equality of proportions with continuity correction data: damages X-squared = 0, df = 1, p-value = 1 alternative hypothesis: two.sided 95 percent confidence interval: -0.7548099 0.6119528 sample estimates: prop 1prop 2 0.4285714 0.500 Warning message: In prop.test(damages) : Chi-squared approximation may be incorrect Does that make sense? Should I maybe be doing a different test instead? Any help would be appreciated, thank you. Agnese __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test on samples of different lengths
Hello, I am trying to see whether there has been a significant difference in whether people experienced damages from wildlife in two different years. I therefore have two columns: year 1: yes no no no yes yes no year 2: no yes no yes I wanted to do a chisq.test, but if I enter it this way: chisq.test(year1, year2) I get the error saying the columns are two different lengths. So then I tried doing: damages<-matrix(c(3,4, 2,2), ncol=2, dimnames=list(answer=c("yes", "no"), year=c("year1", year2))) chisq.test(damages) Does that make sense? Should I maybe be doing a different test instead? Any help would be appreciated, thank you. Agnese __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter, David, and Michael! After having corrected the coding error, the p values converge to particular value, not necessarily zero. The whole story is, 634 respondents in 6 different areas marked their answer on a 7-step Likert scale (very bad, bad, ..., very good -- later recoded to 5 scale levels). The statistical question now is, do the answer's distributions (amount of goods, bads etc.) in either area differ from the "mean" answer-distribution calculated with summing up all goods, bads, etc. Anyway an omnibus chi square would not answer my question, and due to spurious significances I'd rather go back to my chi square book ;-) (for the interested, see http://sozmod.eawag.ch/files/file.Robj for the entire table). Thanks for your help Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter Dalgaard for the correct answer. I misinterpreted what R was returning. On Mar 11, 2009, at 7:32 AM, David Winsemius wrote: On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the "truth". (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? "In principle" they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r <- c(10, 100, 500, 1000, 2000, 5000) v <- c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the "truth". (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? "In principle" they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r <- c(10, 100, 500, 1000, 2000, 5000) v <- c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
soeren.vo...@eawag.ch wrote: > A Likert scale may have produced counts of answers per category. > According to theory I may expect equality over the categories. A > statistical test shall reveal the actual equality in my sample. > > When applying a chi square test with increasing number of repetitions > (simulate.p.value) over a fixed sample, the p-value decreases > dramatically (looks as if converge to zero). > > (1) Why? > (2) (If this test is wrong), then which test can check what I want to > check, that is: are the two distributions of frequencies (observed and > expected) in principle the same? > (3) By the way, how to deal with low frequency cells? > > r <- c(10, 100, 500, 1000, 2000, 5000) > v <- c(35, 40, 45, 45, 40, 35) > sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), > rescale.p=T, simulate.p.value=T, B=x)$p.value }) This is a combination of user error and an infelicity in chisq.test. You are sapply'ing over a list with one element, so essentially you are doing chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value Now B is supposed to be a single integer, so the above cannot be expected to do anything sensible, but you might have hoped for an error message. Instead, it seems that you get the result of r[1] replications divided by r+1: > chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 > 7/(r+1) [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 What you really wanted was > sapply(r,function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) [1] 0.9090909 0.8118812 0.7964072 0.7672328 0.8025987 0.7932414 > Thank you, Sören > > > --Sören Vogel, PhD-Student, Eawag, Dept. SIAM > http://www.eawag.ch, http://sozmod.eawag.ch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test: decreasing p-value
A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r <- c(10, 100, 500, 1000, 2000, 5000) v <- c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test in batch
On 11-Oct-07 22:11:46, João Fadista wrote: > Dear all, > > I would like to compute hundreds of chisq.test ´s, and for each test > output I would like to extract only the p-values. So my question is: > how can I make this without making it manually? ?chisq.test (under "Value") tells you that one component of the output is p.value so: for(i in (1:10)){ x <- matrix(sample((1:100),4),nrow=2) print(chisq.test(x)$p.value) } [1] 0.0009193404 [1] 8.822807e-07 [1] 0.005263787 [1] 0.3424672 [1] 5.72495e-07 [1] 5.29765e-05 [1] 0.6812334 [1] 0.0514063 [1] 8.361445e-13 [1] 0.02701781 [remaining output snipped :)] Best wishes, Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 11-Oct-07 Time: 23:41:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test in batch
Dear all, I would like to compute hundreds of chisq.test ´s, and for each test output I would like to extract only the p-values. So my question is: how can I make this without making it manually? Example: # Test nº1 > chisq.test(c(220,240)) Chi-squared test for given probabilities data: c(220, 240) X-squared = 0.8696, df = 1, p-value = 0.3511 # Test nº2 > chisq.test(c(301,258)) Chi-squared test for given probabilities data: c(301, 258) X-squared = 3.3077, df = 1, p-value = 0.06896 ... # Test nº200 > chisq.test(c(242,281)) Chi-squared test for given probabilities data: c(242, 281) X-squared = 2.9082, df = 1, p-value = 0.08813 Desired output: Test 1 2 ... 200 p-value 0.3511 0.06896 ... 0.08813 Thanks in advance. Best regards, João Fadista __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.