Re: [R] Clean up a complex variable
Dear all, I need to clean up one variables in a dataset. e.g. lets say the dataset is trial, the variable for cleaning up is V1 trial$V1 [1] 0(a=1) 0(b=1) 0.133(b=1) 0.555(a=1) 5.32(a=1) what i need to do is to remove the text (a=1) and (b=1) and the in the V1, and then convert to a numeric variable, and als I am aslo requested that when the value has a=1, the value needs to be divided by 5. what I did is: trialchara-as.character(trial$V1) trialnum-gsub((a=1)|(b=1)|,,trialchara) the result is [1] 0 () 0 () 0.133 () 0.555 () 5.32 () How can I get rid of the () symbol? How can I do this part when the value has a=1, the value needs to be divided by 5. ? Can anyone please give me some hints here? Thanks a lot. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Categorical Response Query
Hi all, I have a queston about Categorical response. i have a data frame containing age, sex, class, success(1=success, 0=non sucess). age, sex,class are the explantory variables, and sucess is the response variable. and i can get n (the nunber of times each age occurs) and r (the number of sucess of that age). when I try to creat the regression relationship for these variables, I have seen many different cases, i just wonder which one fits me the best for this situation. 1st case, xxx.glm-glm(success~age*sex*class,family=binomial, data=xxx.data) 2nd case xxx.glm-glm(r/n~age*sex*class,family=binomial, data=xxx.data) 3rd case xxx.glm-glm(cbind(r,n-r)~age*sex*class,family=binomial, data=xxx.data) what is difference between the above 3 cases? which one is the best to use? if Ii don't group the data, can I use the 1st case. if i group the data, can i use 2nd or 3rd case? please advise. Cheers. Andyer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Categorial Response Questions
hi all, For my question in the first email below, I found I made a mistake on my coding in the previous email, the one I was trying to type should be grouped.titanic.df-data.frame(group.age.group=sort(unique(titanic.df$age.group)), + expand.grid(sex=sort(unique(titanic.df$sex)),pclass=sort(unique(titanic.df$pclass))), + r=as.vector(tapply(titanic.df$survived,titanic.df$age.group,sum)), + n=as.vector(tapply(titanic.df$survived,titanic.df$age.group,length))) Error in data.frame(group.age.group = sort(unique(titanic.df$age.group)), : arguments imply differing number of rows: 8, 6 please advise what I have done wrong? why the error message come up. Am I doing the right thing to fix the question i mentioned in the first email (the bottom email)? Cheers. Andyer -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org hi all, me again. i try to type the following coding for my question below, but it comes up a error messgae. please advise whether the way i was trying to do will solve my question stated in the previous email. If so , please advise what is wrong with my coding. (p.s. all the data are stored in xxx.df) grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)), + expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)), + r=tapply(xxx.df$survived,titanic.df$age.group,sum), + n=tapply(xxx.df$survived,titanic.df$age.group,length)) Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)), : arguments imply differing number of rows: 8, 16 In addition: Warning messages: 1: In Ops.factor(left) : + not meaningful for factors 2: In Ops.factor(left) : + not meaningful for factors thanks millions. Regards, Andyer -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org Sorry Guys, i press the wrong button to send out the uncompleted message. let me do it again. my purpose for below questions is to assess the effect of class, age and sex on the survival. I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age: The age of the passenger in years. sex: Passenger's gender: female or male age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69,70‐79 survived:Passenger's survival (1=survived, 0=did not survive) Ignoring the variable age, - I need to group the data into groups corresponding to each age‐group/sex/class combination, - I need to compute the logits for each combination. - Make a data frame containing the logits, and the categorical variables. I need to have one line in the data frame for each combination of the factor levels. Can someone please help with the R code for above???!!! Thanks millions!! Cheers Andyer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Categorial Response Questions
Hi All, I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age The age of the passenger in years. sex Passenger's gender: female or male age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69, 70‐79 survived Passenger's survival (1=survived, 0=did not survive) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Categorial Response Questions
Sorry Guys, i press the wrong button to send out the uncompleted message. let me do it again. I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age: The age of the passenger in years. sex: Passenger's gender: female or male age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69,70‐79 survived:Passenger's survival (1=survived, 0=did not survive) Ignoring the variable age, - I need to group the data into groups corresponding to each age‐group/sex/class combination, - I need to compute the logits for each combination. - Make a data frame containing the logits, and the categorical variables. I need to have one line in the data frame for each combination of the factor levels. Can someone please help with the R code for above???!!! Thanks millions!! Cheers Andyer. -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: RE:Categorial Response Questions To: r-help@r-project.org Hi All, I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age The age of the passenger in years. sex Passenger's gender: female or male age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69, 70‐79 survived Passenger's survival (1=survived, 0=did not survive) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Categorial Response Questions
further more, my purpose for below questions is to assessthe effect of class, age and sex on the survival. Cheers. -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org Sorry Guys, i press the wrong button to send out the uncompleted message. let me do it again. I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age: The age of the passenger in years. sex: Passenger's gender: female or male age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69,70‐79 survived:Passenger's survival (1=survived, 0=did not survive) Ignoring the variable age, - I need to group the data into groups corresponding to each age‐group/sex/class combination, - I need to compute the logits for each combination. - Make a data frame containing the logits, and the categorical variables. I need to have one line in the data frame for each combination of the factor levels. Can someone please help with the R code for above???!!! Thanks millions!! Cheers Andyer. -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: RE:Categorial Response Questions To: r-help@r-project.org Hi All, I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age The age of the passenger in years. sex Passenger's gender: female or male age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69, 70‐79 survived Passenger's survival (1=survived, 0=did not survive) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Categorial Response Questions
hi all, me again. i try to type the following coding for my question below, but it comes up a error messgae. please advise whether the way i was trying to do will solve my question stated in the previous email. If so , please advise what is wrong with my coding. (p.s. all the data are stored in xxx.df) grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)), + expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)), + r=tapply(xxx.df$survived,titanic.df$age.group,sum), + n=tapply(xxx.df$survived,titanic.df$age.group,length)) Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)), : arguments imply differing number of rows: 8, 16 In addition: Warning messages: 1: In Ops.factor(left) : + not meaningful for factors 2: In Ops.factor(left) : + not meaningful for factors thanks millions. Regards, Andyer -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org Sorry Guys, i press the wrong button to send out the uncompleted message. let me do it again. my purpose for below questions is to assess the effect of class, age and sex on the survival. I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age: The age of the passenger in years. sex: Passenger's gender: female or male age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69,70‐79 survived:Passenger's survival (1=survived, 0=did not survive) Ignoring the variable age, - I need to group the data into groups corresponding to each age‐group/sex/class combination, - I need to compute the logits for each combination. - Make a data frame containing the logits, and the categorical variables. I need to have one line in the data frame for each combination of the factor levels. Can someone please help with the R code for above???!!! Thanks millions!! Cheers Andyer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Coding
Hi all, I am kind of stuck of using Predict function in R to make prediction for a model with continuous variable and categorial variables. i have no problem making the model, the model is e.g. cabbage.lm2- lm(VitC ~ HeadWt + Date + Cult) HeadWt is a continuous variable, Date and Culte are factors. Date have three levels inside (d16,d20,d21), Cult has two levels(c39,c52). I need to calculate a confidence interval for the mean VitC for each combination of Date and Cult, fixing the value of HeadWt at the mean for the corresponding cell. I have already proved that Cult and Date are not interacted. the mean of HeadWt is also found. e.g.2.59 when i type new.df-data.frame(HeadWt=2.59,Cultc52=1,Dated16=1) predict(cabbage.lm2,new.df, interval=confidence) it has error comes up like this: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'Cult') In addition: Warning message: 'newdata' had 1 rows but variable(s) found have 60 rows Is there anything I have done wrong?? Please help with the coding. Thank you so much!!! All the best. Andyer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Coding
Hi all, I am kind of stuck of using Predict function in R to make prediction for a model with continuous variable and categorial variables. i have no problem making the model, the model is e.g. cabbage.lm2- lm(VitC ~ HeadWt + Date + Cult) HeadWt is a continuous variable, Date and Culte are factors. Date have three levels inside (d16,d20,d21), Cult has two levels(c39,c52). I need to calculate a confidence interval for the mean VitC for each combination of Date and Cult, fixing the value of HeadWt at the mean for the corresponding cell. I have already proved that Cult and Date are not interacted. the mean of HeadWt is also found. e.g.2.59 when i type new.df-data.frame(HeadWt=2.59,Cultc52=1,Dated16=1) predict(cabbage.lm2,new.df, interval=confidence) it has error comes up like this: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'Cult') In addition: Warning message: 'newdata' had 1 rows but variable(s) found have 60 rows Is there anything I have done wrong?? Please help with the coding. Thank you so much!!! All the best. Andyer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.