Re: [R] converting stata's by syntax to R
Chris Wallace [EMAIL PROTECTED] writes: I am struggling with migrating some stata code to R Thanks to all who replied. It was very helpful to see a combination of more direct stata-R translations and more R-ish code. which.max() solves my problem this time, but learning about split(), unsplit() and duplicated() should make such problems fewer in the long run. C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] converting stata's by syntax to R
I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N This is my (long-winded) attempt in R: # first keep those rows where wt=max_fam(wt) maxwt - by(dat,dat$fam,function(x) max(x[,2])) maxwt - sapply(maxwt,[[,1) maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt))) dat - merge(dat,maxwt.dat) dat - dat[dat$wt==dat$maxwt,] dat Now I am stuck - I want to keep either row with fam==4, and have tried playing around with combinations of sample and apply or by, but with no success. I can only find an inefficient for-loop solution: # identify those rows with 1 observation more - by(dat,dat$fam,function(x) dim(x)[1]) more - sapply(more,[[,1) more.dat - data.frame(more=more,fam=as.integer(names(more))) dat - merge(dat,more.dat) # sample from those for whom more1 result-dat[dat$more==1,] for(f in unique(dat$fam[dat$more1])) { rows - rownames(dat[dat$fam==f,]) result - rbind(result,dat[sample(rows,1),]) } result I am sure that for something so simple in stata to be so complicated in R must indicate ignorance of R on my part, but searches of help files and RSiteSearch hasn't led to any better solution. Any suggestions would be most helpful! Thanks, C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting stata's by syntax to R
try attach(dat) dat-dat[order(fam,wt),] #sort the data ,as the stata's byable command does lis-by(dat,fam,function(x) x[length(x$fam),]) #equall your stata command ,but return a list. do.call(rbind,lis) #to make the list to be a matrix-like result. fam wt keep 1 1 1.01 2 2 1.01 3 3 0.40 4 4 0.40 === 2005-08-01 22:24:27 您在来信中写道:=== I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N This is my (long-winded) attempt in R: # first keep those rows where wt=max_fam(wt) maxwt - by(dat,dat$fam,function(x) max(x[,2])) maxwt - sapply(maxwt,[[,1) maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt))) dat - merge(dat,maxwt.dat) dat - dat[dat$wt==dat$maxwt,] dat Now I am stuck - I want to keep either row with fam==4, and have tried playing around with combinations of sample and apply or by, but with no success. I can only find an inefficient for-loop solution: # identify those rows with 1 observation more - by(dat,dat$fam,function(x) dim(x)[1]) more - sapply(more,[[,1) more.dat - data.frame(more=more,fam=as.integer(names(more))) dat - merge(dat,more.dat) # sample from those for whom more1 result-dat[dat$more==1,] for(f in unique(dat$fam[dat$more1])) { rows - rownames(dat[dat$fam==f,]) result - rbind(result,dat[sample(rows,1),]) } result I am sure that for something so simple in stata to be so complicated in R must indicate ignorance of R on my part, but searches of help files and RSiteSearch hasn't led to any better solution. Any suggestions would be most helpful! Thanks, C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html = = = = = = = = = = = = = = = = = = = = 2005-08-01 -- Deparment of Sociology Fudan University Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting stata's by syntax to R
Chris Wallace [EMAIL PROTECTED] writes: I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N This is my (long-winded) attempt in R: # first keep those rows where wt=max_fam(wt) maxwt - by(dat,dat$fam,function(x) max(x[,2])) maxwt - sapply(maxwt,[[,1) maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt))) dat - merge(dat,maxwt.dat) dat - dat[dat$wt==dat$maxwt,] dat Now I am stuck - I want to keep either row with fam==4, and have tried playing around with combinations of sample and apply or by, but with no success. I can only find an inefficient for-loop solution: # identify those rows with 1 observation more - by(dat,dat$fam,function(x) dim(x)[1]) more - sapply(more,[[,1) more.dat - data.frame(more=more,fam=as.integer(names(more))) dat - merge(dat,more.dat) # sample from those for whom more1 result-dat[dat$more==1,] for(f in unique(dat$fam[dat$more1])) { rows - rownames(dat[dat$fam==f,]) result - rbind(result,dat[sample(rows,1),]) } result I am sure that for something so simple in stata to be so complicated in R must indicate ignorance of R on my part, but searches of help files and RSiteSearch hasn't led to any better solution. Any suggestions would be most helpful! Thanks, C. How about unsplit(lapply(split(dat,dat$fam), function(x) seq(length=nrow(x)) == which.max(x$wt)), dat$fam) or do.call(rbind, lapply(split(dat,dat$fam), function(x) x[which.max(x$wt),])) or (same thing, basically) do.call(rbind, by(dat,dat$fam,function(x) x[which.max(x$wt),])) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting stata's by syntax to R
Here is one way this can be done do.call(rbind, by(dat, list(dat$fam) ,function(x) { + if(NROW(x)1) return(x[which.max(x$wt),]) + else return(x)} + )) and it returns fam wt keep 1 1 1.01 2 2 1.01 3 3 0.61 4 4 0.41 hth, On Mon, 1 Aug 2005, Chris Wallace wrote: I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N This is my (long-winded) attempt in R: # first keep those rows where wt=max_fam(wt) maxwt - by(dat,dat$fam,function(x) max(x[,2])) maxwt - sapply(maxwt,[[,1) maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt))) dat - merge(dat,maxwt.dat) dat - dat[dat$wt==dat$maxwt,] dat Now I am stuck - I want to keep either row with fam==4, and have tried playing around with combinations of sample and apply or by, but with no success. I can only find an inefficient for-loop solution: # identify those rows with 1 observation more - by(dat,dat$fam,function(x) dim(x)[1]) more - sapply(more,[[,1) more.dat - data.frame(more=more,fam=as.integer(names(more))) dat - merge(dat,more.dat) # sample from those for whom more1 result-dat[dat$more==1,] for(f in unique(dat$fam[dat$more1])) { rows - rownames(dat[dat$fam==f,]) result - rbind(result,dat[sample(rows,1),]) } result I am sure that for something so simple in stata to be so complicated in R must indicate ignorance of R on my part, but searches of help files and RSiteSearch hasn't led to any better solution. Any suggestions would be most helpful! Thanks, C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting stata's by syntax to R
if you also need to create the `keep' vector, then you could try this approach: fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) dat - data.frame(fam, wt) ### keep - unlist( lapply(split(wt, fam), function(x){ ind - rep(FALSE, length(x)) ind[which.max(x)] - TRUE ind }) ) as.numeric(keep) dat[keep, ] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Chris Wallace [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Monday, August 01, 2005 4:24 PM Subject: [R] converting stata's by syntax to R I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N This is my (long-winded) attempt in R: # first keep those rows where wt=max_fam(wt) maxwt - by(dat,dat$fam,function(x) max(x[,2])) maxwt - sapply(maxwt,[[,1) maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt))) dat - merge(dat,maxwt.dat) dat - dat[dat$wt==dat$maxwt,] dat Now I am stuck - I want to keep either row with fam==4, and have tried playing around with combinations of sample and apply or by, but with no success. I can only find an inefficient for-loop solution: # identify those rows with 1 observation more - by(dat,dat$fam,function(x) dim(x)[1]) more - sapply(more,[[,1) more.dat - data.frame(more=more,fam=as.integer(names(more))) dat - merge(dat,more.dat) # sample from those for whom more1 result-dat[dat$more==1,] for(f in unique(dat$fam[dat$more1])) { rows - rownames(dat[dat$fam==f,]) result - rbind(result,dat[sample(rows,1),]) } result I am sure that for something so simple in stata to be so complicated in R must indicate ignorance of R on my part, but searches of help files and RSiteSearch hasn't led to any better solution. Any suggestions would be most helpful! Thanks, C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting stata's by syntax to R
On Mon, 1 Aug 2005, Chris Wallace wrote: I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam - c(1,2,3,3,4,4,4) wt - c(1,1,0.6,0.4,0.4,0.4,0.2) keep - c(1,1,1,0,1,0,0) dat - as.data.frame(cbind(fam,wt,keep)) dat I want to keep the observation for which wt is a maximum, and where this doesn't identify a unique observation, to keep just one anyway, not caring which. Those observations are indicated above by keep==1. (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not c(1,1,1,0,0,0,1)). The stata code I would use is bys fam (wt): keep if _n==_N A reasonably direct translation of the Stata code is index - order(fam, -wt) keep - !duplicated(fam[index]) dat - data.frame(fam=fam[index], wt=wt[index], keep=keep) which sorts wt into decreasing order within family, then keeps the first observation in each family. This is less general than solutions other people have given, but I'd expect it to be faster for large data sets. 'keep' ends up TRUE/FALSE rather than 1/0; if this is a problem use as.numeric() on it. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html