Re: [R] Clean up a complex variable

2009-05-23 Thread andyer weng
Dear all,
I need to clean up one variables in a dataset.
e.g. lets say the dataset is trial, the variable for cleaning up is V1
trial$V1
[1] 0(a=1)   0(b=1)  0.133(b=1)   0.555(a=1)  5.32(a=1)
what i need to do is to remove the text (a=1) and (b=1) and the  in the
V1, and then convert to a numeric variable, and als I am aslo requested that
when the value has a=1, the value needs to be divided by 5.
what I did is:
trialchara-as.character(trial$V1)
trialnum-gsub((a=1)|(b=1)|,,trialchara)
the result is
[1] 0 () 0 () 0.133 () 0.555 () 5.32 ()
How can I get rid of the () symbol?
How can I do this part when the value has a=1, the value needs to be
divided by 5. ?
Can anyone please give me some hints here?
Thanks a lot.
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Categorical Response Query

2008-10-20 Thread andyer weng
Hi all,

I have a queston about Categorical response.

i have a data frame containing age, sex, class, success(1=success,
0=non sucess).
age, sex,class are the explantory variables, and sucess is the
response variable.  and i can get n (the nunber of times each age
occurs) and r (the number of sucess of that age).

when I try to creat the regression relationship for these variables, I
have seen many different cases, i just wonder which one fits me the
best for this situation.

1st case,
xxx.glm-glm(success~age*sex*class,family=binomial, data=xxx.data)

2nd case

xxx.glm-glm(r/n~age*sex*class,family=binomial, data=xxx.data)

3rd case

xxx.glm-glm(cbind(r,n-r)~age*sex*class,family=binomial, data=xxx.data)

what is difference between the above 3 cases? which one is the best to use?

if Ii don't group the data, can I use the 1st case. if i group the
data, can i use 2nd or 3rd case?

please advise.

Cheers.
Andyer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Categorial Response Questions

2008-10-18 Thread andyer weng
hi all,

For my question in the first email below, I found I made a mistake on
my coding in the previous email, the one I was trying to type should
be

 grouped.titanic.df-data.frame(group.age.group=sort(unique(titanic.df$age.group)),
+ 
expand.grid(sex=sort(unique(titanic.df$sex)),pclass=sort(unique(titanic.df$pclass))),
+ r=as.vector(tapply(titanic.df$survived,titanic.df$age.group,sum)),
+ n=as.vector(tapply(titanic.df$survived,titanic.df$age.group,length)))

Error in data.frame(group.age.group = sort(unique(titanic.df$age.group)),  :
  arguments imply differing number of rows: 8, 6


please advise what I have done wrong? why the error message come up.
Am I doing the right thing to fix the question i mentioned in the
first email (the bottom email)?

Cheers. Andyer


-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: Fwd: Categorial Response Questions
To: r-help@r-project.org


hi all,

me again. i try to type the following coding for my question below,
but it comes up a error messgae. please advise whether the way i was
trying to do will solve my question stated in the previous email. If
so , please advise what is wrong with my coding.
(p.s. all the data are stored in xxx.df)

  grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)),
+ 
expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)),
+ r=tapply(xxx.df$survived,titanic.df$age.group,sum),
+ n=tapply(xxx.df$survived,titanic.df$age.group,length))

Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)),  :
 arguments imply differing number of rows: 8, 16
In addition: Warning messages:
1: In Ops.factor(left) : + not meaningful for factors
2: In Ops.factor(left) : + not meaningful for factors


thanks millions.

Regards,
Andyer






-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: Fwd: Categorial Response Questions
To: r-help@r-project.org


Sorry Guys, i press the wrong button to send out the uncompleted message.

let me do it again.

my purpose for below questions  is to assess the effect of class, age
and sex on the survival.


I have a data set containing :

pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age:  The age of the passenger in years.
sex:  Passenger's gender: female or male
age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29,
30‐39, 40‐49, 50‐59, 60‐69,70‐79
survived:Passenger's survival (1=survived, 0=did not survive)

Ignoring the variable age,
- I need to group the data into groups corresponding to each
age‐group/sex/class combination,
- I need to compute the logits for each combination.
- Make a data frame containing the logits, and the categorical
variables. I need to have one line in the data frame for each
combination of the factor levels.

Can someone please help with the R code for above???!!!

Thanks millions!!

Cheers
Andyer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Categorial Response Questions

2008-10-17 Thread andyer weng
Hi All,

I have a data set containing :
pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age The age of the passenger in years.
sex Passenger's gender: female or male
age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39,
40‐49, 50‐59, 60‐69,
70‐79
survived Passenger's survival (1=survived, 0=did not survive)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Categorial Response Questions

2008-10-17 Thread andyer weng
Sorry Guys, i press the wrong button to send out the uncompleted message.

let me do it again.

I have a data set containing :

pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age:  The age of the passenger in years.
sex:  Passenger's gender: female or male
age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29,
30‐39, 40‐49, 50‐59, 60‐69,70‐79
survived:Passenger's survival (1=survived, 0=did not survive)

Ignoring the variable age,
- I need to group the data into groups corresponding to each
age‐group/sex/class combination,
- I need to compute the logits for each combination.
- Make a data frame containing the logits, and the categorical
variables. I need to have one line in the data frame for each
combination of the factor levels.

Can someone please help with the R code for above???!!!

Thanks millions!!

Cheers
Andyer.


-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: RE:Categorial Response Questions
To: r-help@r-project.org


Hi All,

I have a data set containing :
pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age The age of the passenger in years.
sex Passenger's gender: female or male
age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39,
40‐49, 50‐59, 60‐69,
70‐79
survived Passenger's survival (1=survived, 0=did not survive)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Categorial Response Questions

2008-10-17 Thread andyer weng
further more, my purpose for below questions  is to assessthe effect
of class, age and sex on the survival.

Cheers.


-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: Fwd: Categorial Response Questions
To: r-help@r-project.org


Sorry Guys, i press the wrong button to send out the uncompleted message.

let me do it again.

I have a data set containing :

pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age:  The age of the passenger in years.
sex:  Passenger's gender: female or male
age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29,
30‐39, 40‐49, 50‐59, 60‐69,70‐79
survived:Passenger's survival (1=survived, 0=did not survive)

Ignoring the variable age,
- I need to group the data into groups corresponding to each
age‐group/sex/class combination,
- I need to compute the logits for each combination.
- Make a data frame containing the logits, and the categorical
variables. I need to have one line in the data frame for each
combination of the factor levels.

Can someone please help with the R code for above???!!!

Thanks millions!!

Cheers
Andyer.


-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: RE:Categorial Response Questions
To: r-help@r-project.org


Hi All,

I have a data set containing :
pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age The age of the passenger in years.
sex Passenger's gender: female or male
age.group Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39,
40‐49, 50‐59, 60‐69,
70‐79
survived Passenger's survival (1=survived, 0=did not survive)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Categorial Response Questions

2008-10-17 Thread andyer weng
hi all,

me again. i try to type the following coding for my question below,
but it comes up a error messgae. please advise whether the way i was
trying to do will solve my question stated in the previous email. If
so , please advise what is wrong with my coding.
(p.s. all the data are stored in xxx.df)

  grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)),
+ 
expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)),
+ r=tapply(xxx.df$survived,titanic.df$age.group,sum),
+ n=tapply(xxx.df$survived,titanic.df$age.group,length))

Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)),  :
  arguments imply differing number of rows: 8, 16
In addition: Warning messages:
1: In Ops.factor(left) : + not meaningful for factors
2: In Ops.factor(left) : + not meaningful for factors


thanks millions.

Regards,
Andyer






-- Forwarded message --
From: andyer weng [EMAIL PROTECTED]
Date: 2008/10/18
Subject: Fwd: Categorial Response Questions
To: r-help@r-project.org


Sorry Guys, i press the wrong button to send out the uncompleted message.

let me do it again.

my purpose for below questions  is to assess the effect of class, age
and sex on the survival.


I have a data set containing :

pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
age:  The age of the passenger in years.
sex:  Passenger's gender: female or male
age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29,
30‐39, 40‐49, 50‐59, 60‐69,70‐79
survived:Passenger's survival (1=survived, 0=did not survive)

Ignoring the variable age,
- I need to group the data into groups corresponding to each
age‐group/sex/class combination,
- I need to compute the logits for each combination.
- Make a data frame containing the logits, and the categorical
variables. I need to have one line in the data frame for each
combination of the factor levels.

Can someone please help with the R code for above???!!!

Thanks millions!!

Cheers
Andyer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on R Coding

2008-10-05 Thread andyer weng
Hi all,

I am kind of stuck of  using Predict function in R to make prediction
for a model with continuous variable and categorial variables. i have
no problem making the model, the model is e.g.

cabbage.lm2- lm(VitC ~ HeadWt + Date + Cult)

HeadWt is a continuous variable, Date and Culte are factors. Date have
three levels inside (d16,d20,d21), Cult has two levels(c39,c52). I
need to calculate a confidence interval for the mean VitC for each
combination of Date and Cult, fixing the value of HeadWt at the mean
for the corresponding cell. I have already proved that Cult and Date
are not interacted. the mean of HeadWt is also found. e.g.2.59

when i type

 new.df-data.frame(HeadWt=2.59,Cultc52=1,Dated16=1)
 predict(cabbage.lm2,new.df, interval=confidence)

it has error comes up like this:
Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev = object$xlevels) :
  variable lengths differ (found for 'Cult')
In addition: Warning message:
'newdata' had 1 rows but variable(s) found have 60 rows


Is there anything I have done wrong?? Please help with the coding.

Thank you so much!!!

All the best.

Andyer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on R Coding

2008-10-04 Thread andyer weng
Hi all,

I am kind of stuck of  using Predict function in R to make prediction
for a model with continuous variable and categorial variables. i have
no problem making the model, the model is e.g.

cabbage.lm2- lm(VitC ~ HeadWt + Date + Cult)

HeadWt is a continuous variable, Date and Culte are factors. Date have
three levels inside (d16,d20,d21), Cult has two levels(c39,c52). I
need to calculate a confidence interval for the mean VitC for each
combination of Date and Cult, fixing the value of HeadWt at the mean
for the corresponding cell. I have already proved that Cult and Date
are not interacted. the mean of HeadWt is also found. e.g.2.59

when i type

 new.df-data.frame(HeadWt=2.59,Cultc52=1,Dated16=1)
 predict(cabbage.lm2,new.df, interval=confidence)

it has error comes up like this:
Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev = object$xlevels) :
 variable lengths differ (found for 'Cult')
In addition: Warning message:
'newdata' had 1 rows but variable(s) found have 60 rows


Is there anything I have done wrong?? Please help with the coding.

Thank you so much!!!

All the best.

Andyer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.