[R] chisq.test: decreasing p-value
A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) This is a combination of user error and an infelicity in chisq.test. You are sapply'ing over a list with one element, so essentially you are doing chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value Now B is supposed to be a single integer, so the above cannot be expected to do anything sensible, but you might have hoped for an error message. Instead, it seems that you get the result of r[1] replications divided by r+1: chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 7/(r+1) [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 What you really wanted was sapply(r,function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) [1] 0.9090909 0.8118812 0.7964072 0.7672328 0.8025987 0.7932414 Thank you, Sören --Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the truth. (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? In principle they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter Dalgaard for the correct answer. I misinterpreted what R was returning. On Mar 11, 2009, at 7:32 AM, David Winsemius wrote: On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the truth. (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? In principle they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter, David, and Michael! After having corrected the coding error, the p values converge to particular value, not necessarily zero. The whole story is, 634 respondents in 6 different areas marked their answer on a 7-step Likert scale (very bad, bad, ..., very good -- later recoded to 5 scale levels). The statistical question now is, do the answer's distributions (amount of goods, bads etc.) in either area differ from the mean answer-distribution calculated with summing up all goods, bads, etc. Anyway an omnibus chi square would not answer my question, and due to spurious significances I'd rather go back to my chi square book ;-) (for the interested, see http://sozmod.eawag.ch/files/file.Robj for the entire table). Thanks for your help Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.