[R] testing independence of categorical variables
hi, is there a way of calculating of measuring dependence between two categorical variables. i tried using the chi square test to test for independence but i got error saying that the lengths of the two vectors don't match. Suppose X and Y are two factors. X has 5 levels and Y has 7 levels. This is what i tried doing >temp<-chisq.test(x,y) but got error "the lengths of the two vectors don't match". any help will be appreciated -- Regards, Rana Shoaaib Mehmood __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
Hi, When testing whether random variables X and Y are independent the usual assumption is that you have n pairs of outcomes - (X1,Y1), (X2,Y2), ... , (Xn,Yn) and you are basically checking whether the value of X affects the value of Y. If you have 7 observations of X and 5 separate observations of Y (which have nothing to do with the observations of X) you can not test for independence. Regards, Moshe. --- Shoaaib Mehmood <[EMAIL PROTECTED]> wrote: > hi, > > is there a way of calculating of measuring > dependence between two > categorical variables. i tried using the chi square > test to test for > independence but i got error saying that the lengths > of the two > vectors don't match. Suppose X and Y are two > factors. X has 5 levels > and Y has 7 levels. This is what i tried doing > > >temp<-chisq.test(x,y) > > but got error "the lengths of the two vectors don't > match". any help > will be appreciated > -- > Regards, > Rana Shoaaib Mehmood > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
"Shoaaib Mehmood" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > hi, > > is there a way of calculating of measuring dependence between two > categorical variables. i tried using the chi square test to test for > independence but i got error saying that the lengths of the two > vectors don't match. Suppose X and Y are two factors. X has 5 levels > and Y has 7 levels. This is what i tried doing > >>temp<-chisq.test(x,y) > > but got error "the lengths of the two vectors don't match". any help > will be appreciated If you posted the table, it might be more clear why the error was being thrown. In the example shown you have mixed "x" and "X". They would be different in R. chisq.test should not be having a problem with unequal row and column lengths. #simulate a 5 x 7 table > TT<-r2dtable(1,5*c(1,8,5,8,4),5*c(3,3,3,3,4,4,6)) > TT [[1]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]0110210 [2,]336628 12 [3,]1233925 [4,]833367 10 [5,]3623123 #general test for association > chisq.test(TT[[1]],TT[[2]]) Pearson's Chi-squared test data: TT[[1]] X-squared = 33.5942, df = 24, p-value = 0.09214 Warning message: In chisq.test(TT[[1]], TT[[2]]) : Chi-squared approximation may be incorrect -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
i cant find help for xtab. Which package contains this function On Nov 24, 2007 12:16 PM, G Ilhamto <[EMAIL PROTECTED]> wrote: > hi shohaib, > have you tried xtab instead of chisq.test? > > Ilham > > > > On Nov 22, 2007 6:16 AM, Shoaaib Mehmood <[EMAIL PROTECTED]> wrote: > > > > > > > > hi, > > > > is there a way of calculating of measuring dependence between two > > categorical variables. i tried using the chi square test to test for > > independence but i got error saying that the lengths of the two > > vectors don't match. Suppose X and Y are two factors. X has 5 levels > > and Y has 7 levels. This is what i tried doing > > > > >temp<-chisq.test(x,y) > > > > but got error "the lengths of the two vectors don't match". any help > > will be appreciated > > -- > > Regards, > > Rana Shoaaib Mehmood > > > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- Regards, Rana Shoaaib Mehmood (+92) 333 550 4531 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
On Thu, 2007-11-22 at 16:16 +0500, Shoaaib Mehmood wrote: > hi, > > is there a way of calculating of measuring dependence between two > categorical variables. i tried using the chi square test to test for > independence but i got error saying that the lengths of the two > vectors don't match. Suppose X and Y are two factors. X has 5 levels > and Y has 7 levels. This is what i tried doing > > >temp<-chisq.test(x,y) > > but got error "the lengths of the two vectors don't match". any help > will be appreciated Hi Shoaaib, Try using chisq.test(table(x,y)). If you using chisq.test(x,y) R will testing goodness-of-fit. -- Bernardo Rangel Tura, M.D,Ph.D National Institute of Cardiology Brazil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
prettyR --- Shoaaib Mehmood <[EMAIL PROTECTED]> wrote: > i cant find help for xtab. Which package contains > this function > > On Nov 24, 2007 12:16 PM, G Ilhamto > <[EMAIL PROTECTED]> wrote: > > hi shohaib, > > have you tried xtab instead of chisq.test? > > > > Ilham > > > > > > > > On Nov 22, 2007 6:16 AM, Shoaaib Mehmood > <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > hi, > > > > > > is there a way of calculating of measuring > dependence between two > > > categorical variables. i tried using the chi > square test to test for > > > independence but i got error saying that the > lengths of the two > > > vectors don't match. Suppose X and Y are two > factors. X has 5 levels > > > and Y has 7 levels. This is what i tried doing > > > > > > >temp<-chisq.test(x,y) > > > > > > but got error "the lengths of the two vectors > don't match". any help > > > will be appreciated > > > -- > > > Regards, > > > Rana Shoaaib Mehmood > > > > > > > > > __ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > > > > > > -- > Regards, > Rana Shoaaib Mehmood > (+92) 333 550 4531 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > Instant Messaging, free SMS, sharing photos and more... Try the new Yahoo! Canada Messenger at http://ca.beta.messenger.yahoo.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
The chi-square does not need your two categorical variables to have equal levels, nor limitation for the number of levels. The Chi-square procedure is as follow: χ^2=∑_(All Cells)▒〖(Observed-Expected)〗^2/Expected Expected Cell= E_ij=n((i^th RowTotal)/n)((j^th RowTotal)/n) Degree of Freedom=df= (row-1)(Col-1) This way should not give you any errors if your calculations are all correct. I usually use SAS for calculations like this. Below is a sample code I wrote to test whether US_State and Blood type are independent. You can modify it for your data and should give you no error. data bloodtype; input bloodtype$ state$ count@@; datalines; A FL 122 B FL 117 AB FL 19 O FL 244 A IA 1781 B IA 351 AB IA 289 O IA 3301 A MO 353 B MO 269 AB MO 60 O MO 713 ; proc freq data=bloodtype; tables bloodtype*state / cellchi2 chisq expected norow nocol nopercent; weight count; quit; Best Ramin Gainesville Shoaaib Mehmood wrote: > > hi, > > is there a way of calculating of measuring dependence between two > categorical variables. i tried using the chi square test to test for > independence but i got error saying that the lengths of the two > vectors don't match. Suppose X and Y are two factors. X has 5 levels > and Y has 7 levels. This is what i tried doing > >>temp<-chisq.test(x,y) > > but got error "the lengths of the two vectors don't match". any help > will be appreciated > -- > Regards, > Rana Shoaaib Mehmood > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/testing-independence-of-categorical-variables-tf4855773.html#a14202348 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing independence of categorical variables
Hi Well, R does exactly what it says. From help page. "Otherwise, x and y must be vectors or factors of the same length" I do not know SAS but I presume that > tables bloodtype*state gives you something like tab <- table(bloodtype, state) and chisq.test(tab) shall give you the expected result. You can also do directly chisq.test(bloodtype, state). But what you cannot do is to test vectors unequal **lengths**, and that is what he did. I beleve that you can not do it in SAS either. x<-sample(letters[1:3], 10, replace=T) x [1] "c" "a" "c" "c" "a" "c" "a" "c" "a" "a" y<-sample(1:5, 20, replace=T) > y [1] 2 5 1 1 2 5 2 3 1 5 5 5 1 5 5 3 2 2 5 1 > chisq.test(x,y) Error in chisq.test(x, y) : 'x' and 'y' must have the same length x<-sample(letters[1:3], 20, replace=T) > chisq.test(x,y) Pearson's Chi-squared test data: x and y X-squared = 4.7937, df = 6, p-value = 0.5705 Warning message: In chisq.test(x, y) : Chi-squared approximation may be incorrect > Regards Petr [EMAIL PROTECTED] napsal dne 06.12.2007 23:09:24: > > The chi-square does not need your two categorical variables to have equal > levels, nor limitation for the number of levels. > > The Chi-square procedure is as follow: > χ^2=∑_(All Cells)▒〖(Observed-Expected)〗^2/Expected > > Expected Cell= E_ij=n((i^th RowTotal)/n)((j^th RowTotal)/n) > > Degree of Freedom=df= (row-1)(Col-1) > > This way should not give you any errors if your calculations are all > correct. I usually use SAS for calculations like this. Below is a sample > code I wrote to test whether US_State and Blood type are independent. You > can modify it for your data and should give you no error. > > data bloodtype; > input bloodtype$ state$ count@@; > datalines; > A FL 122 B FL 117 > AB FL 19 O FL 244 > A IA 1781 B IA 351 > AB IA 289 O IA 3301 > A MO 353 B MO 269 > AB MO 60 O MO 713 > ; > proc freq data=bloodtype; > tables bloodtype*state > / cellchi2 chisq expected norow nocol nopercent; > weight count; > quit; > > > Best > Ramin > Gainesville > > > > Shoaaib Mehmood wrote: > > > > hi, > > > > is there a way of calculating of measuring dependence between two > > categorical variables. i tried using the chi square test to test for > > independence but i got error saying that the lengths of the two > > vectors don't match. Suppose X and Y are two factors. X has 5 levels > > and Y has 7 levels. This is what i tried doing > > > >>temp<-chisq.test(x,y) > > > > but got error "the lengths of the two vectors don't match". any help > > will be appreciated > > -- > > Regards, > > Rana Shoaaib Mehmood > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > View this message in context: http://www.nabble.com/testing-independence-of- > categorical-variables-tf4855773.html#a14202348 > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.