Re: [R] converting stata's by syntax to R

2005-08-02 Thread Chris Wallace
Chris Wallace [EMAIL PROTECTED] writes:

 I am struggling with migrating some stata code to R

Thanks to all who replied.  It was very helpful to see a combination
of more direct stata-R translations and more R-ish code.  which.max()
solves my problem this time, but learning about split(), unsplit() and
duplicated() should make such problems fewer in the long run.

C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] converting stata's by syntax to R

2005-08-01 Thread Chris Wallace
I am struggling with migrating some stata code to R.  I have a data
frame containing, sometimes, repeat observations (rows) of the same
family.  I want to keep only one observation per family, selecting
that observation according to some other variable.  An example data
frame is:

# construct example data
fam - c(1,2,3,3,4,4,4)
wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
keep - c(1,1,1,0,1,0,0)
dat - as.data.frame(cbind(fam,wt,keep))
dat

I want to keep the observation for which wt is a maximum, and where
this doesn't identify a unique observation, to keep just one anyway,
not caring which.  Those observations are indicated above by keep==1.
(Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
c(1,1,1,0,0,0,1)).

The stata code I would use is
bys fam (wt): keep if _n==_N

This is my (long-winded) attempt in R:

# first keep those rows where wt=max_fam(wt)
maxwt - by(dat,dat$fam,function(x) max(x[,2]))
maxwt - sapply(maxwt,[[,1)
maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt)))
dat - merge(dat,maxwt.dat)
dat - dat[dat$wt==dat$maxwt,]
dat

Now I am stuck - I want to keep either row with fam==4, and have tried
playing around with combinations of sample and apply or by, but with
no success.  I can only find an inefficient for-loop solution:
  
# identify those rows with 1 observation
more - by(dat,dat$fam,function(x) dim(x)[1])
more - sapply(more,[[,1)
more.dat - data.frame(more=more,fam=as.integer(names(more)))
dat - merge(dat,more.dat)

# sample from those for whom more1
result-dat[dat$more==1,]
for(f in unique(dat$fam[dat$more1])) {
  rows - rownames(dat[dat$fam==f,])
  result - rbind(result,dat[sample(rows,1),])
}
result

I am sure that for something so simple in stata to be so complicated
in R must indicate ignorance of R on my part, but searches of help
files and RSiteSearch hasn't led to any better solution.

Any suggestions would be most helpful! Thanks, C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] converting stata's by syntax to R

2005-08-01 Thread ronggui
try
 attach(dat)
 dat-dat[order(fam,wt),]
#sort the data ,as the stata's byable command does
 lis-by(dat,fam,function(x) x[length(x$fam),])
#equall your stata command ,but return a list.
 do.call(rbind,lis)
#to make the list to be a matrix-like result.
  fam  wt keep
1   1 1.01
2   2 1.01
3   3 0.40
4   4 0.40



=== 2005-08-01 22:24:27 您在来信中写道:===

I am struggling with migrating some stata code to R.  I have a data
frame containing, sometimes, repeat observations (rows) of the same
family.  I want to keep only one observation per family, selecting
that observation according to some other variable.  An example data
frame is:

# construct example data
fam - c(1,2,3,3,4,4,4)
wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
keep - c(1,1,1,0,1,0,0)
dat - as.data.frame(cbind(fam,wt,keep))
dat

I want to keep the observation for which wt is a maximum, and where
this doesn't identify a unique observation, to keep just one anyway,
not caring which.  Those observations are indicated above by keep==1.
(Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
c(1,1,1,0,0,0,1)).

The stata code I would use is
bys fam (wt): keep if _n==_N

This is my (long-winded) attempt in R:

# first keep those rows where wt=max_fam(wt)
maxwt - by(dat,dat$fam,function(x) max(x[,2]))
maxwt - sapply(maxwt,[[,1)
maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt)))
dat - merge(dat,maxwt.dat)
dat - dat[dat$wt==dat$maxwt,]
dat

Now I am stuck - I want to keep either row with fam==4, and have tried
playing around with combinations of sample and apply or by, but with
no success.  I can only find an inefficient for-loop solution:
  
# identify those rows with 1 observation
more - by(dat,dat$fam,function(x) dim(x)[1])
more - sapply(more,[[,1)
more.dat - data.frame(more=more,fam=as.integer(names(more)))
dat - merge(dat,more.dat)

# sample from those for whom more1
result-dat[dat$more==1,]
for(f in unique(dat$fam[dat$more1])) {
  rows - rownames(dat[dat$fam==f,])
  result - rbind(result,dat[sample(rows,1),])
}
result

I am sure that for something so simple in stata to be so complicated
in R must indicate ignorance of R on my part, but searches of help
files and RSiteSearch hasn't led to any better solution.

Any suggestions would be most helpful! Thanks, C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

= = = = = = = = = = = = = = = = = = = =



 

2005-08-01

--
Deparment of Sociology
Fudan University

Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] converting stata's by syntax to R

2005-08-01 Thread Peter Dalgaard
Chris Wallace [EMAIL PROTECTED] writes:

 I am struggling with migrating some stata code to R.  I have a data
 frame containing, sometimes, repeat observations (rows) of the same
 family.  I want to keep only one observation per family, selecting
 that observation according to some other variable.  An example data
 frame is:
 
 # construct example data
 fam - c(1,2,3,3,4,4,4)
 wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
 keep - c(1,1,1,0,1,0,0)
 dat - as.data.frame(cbind(fam,wt,keep))
 dat
 
 I want to keep the observation for which wt is a maximum, and where
 this doesn't identify a unique observation, to keep just one anyway,
 not caring which.  Those observations are indicated above by keep==1.
 (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
 c(1,1,1,0,0,0,1)).
 
 The stata code I would use is
 bys fam (wt): keep if _n==_N
 
 This is my (long-winded) attempt in R:
 
 # first keep those rows where wt=max_fam(wt)
 maxwt - by(dat,dat$fam,function(x) max(x[,2]))
 maxwt - sapply(maxwt,[[,1)
 maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt)))
 dat - merge(dat,maxwt.dat)
 dat - dat[dat$wt==dat$maxwt,]
 dat
 
 Now I am stuck - I want to keep either row with fam==4, and have tried
 playing around with combinations of sample and apply or by, but with
 no success.  I can only find an inefficient for-loop solution:
   
 # identify those rows with 1 observation
 more - by(dat,dat$fam,function(x) dim(x)[1])
 more - sapply(more,[[,1)
 more.dat - data.frame(more=more,fam=as.integer(names(more)))
 dat - merge(dat,more.dat)
 
 # sample from those for whom more1
 result-dat[dat$more==1,]
 for(f in unique(dat$fam[dat$more1])) {
   rows - rownames(dat[dat$fam==f,])
   result - rbind(result,dat[sample(rows,1),])
 }
 result
 
 I am sure that for something so simple in stata to be so complicated
 in R must indicate ignorance of R on my part, but searches of help
 files and RSiteSearch hasn't led to any better solution.
 
 Any suggestions would be most helpful! Thanks, C.

How about

unsplit(lapply(split(dat,dat$fam), 
   function(x) seq(length=nrow(x)) == which.max(x$wt)),
dat$fam)

or 

do.call(rbind, lapply(split(dat,dat$fam),
function(x) x[which.max(x$wt),]))

or (same thing, basically)

do.call(rbind, by(dat,dat$fam,function(x) x[which.max(x$wt),]))

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] converting stata's by syntax to R

2005-08-01 Thread Jean Eid
Here is one way this can be done
 do.call(rbind, by(dat, list(dat$fam) ,function(x) {
+ if(NROW(x)1) return(x[which.max(x$wt),])
+ else return(x)}
+ ))


and it returns
  fam  wt keep
1   1 1.01
2   2 1.01
3   3 0.61
4   4 0.41



hth,


On Mon, 1 Aug 2005, Chris Wallace wrote:

 I am struggling with migrating some stata code to R.  I have a data
 frame containing, sometimes, repeat observations (rows) of the same
 family.  I want to keep only one observation per family, selecting
 that observation according to some other variable.  An example data
 frame is:

 # construct example data
 fam - c(1,2,3,3,4,4,4)
 wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
 keep - c(1,1,1,0,1,0,0)
 dat - as.data.frame(cbind(fam,wt,keep))
 dat

 I want to keep the observation for which wt is a maximum, and where
 this doesn't identify a unique observation, to keep just one anyway,
 not caring which.  Those observations are indicated above by keep==1.
 (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
 c(1,1,1,0,0,0,1)).

 The stata code I would use is
 bys fam (wt): keep if _n==_N

 This is my (long-winded) attempt in R:

 # first keep those rows where wt=max_fam(wt)
 maxwt - by(dat,dat$fam,function(x) max(x[,2]))
 maxwt - sapply(maxwt,[[,1)
 maxwt.dat - data.frame(maxwt=maxwt,fam=as.integer(names(maxwt)))
 dat - merge(dat,maxwt.dat)
 dat - dat[dat$wt==dat$maxwt,]
 dat

 Now I am stuck - I want to keep either row with fam==4, and have tried
 playing around with combinations of sample and apply or by, but with
 no success.  I can only find an inefficient for-loop solution:

 # identify those rows with 1 observation
 more - by(dat,dat$fam,function(x) dim(x)[1])
 more - sapply(more,[[,1)
 more.dat - data.frame(more=more,fam=as.integer(names(more)))
 dat - merge(dat,more.dat)

 # sample from those for whom more1
 result-dat[dat$more==1,]
 for(f in unique(dat$fam[dat$more1])) {
   rows - rownames(dat[dat$fam==f,])
   result - rbind(result,dat[sample(rows,1),])
 }
 result

 I am sure that for something so simple in stata to be so complicated
 in R must indicate ignorance of R on my part, but searches of help
 files and RSiteSearch hasn't led to any better solution.

 Any suggestions would be most helpful! Thanks, C.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] converting stata's by syntax to R

2005-08-01 Thread Dimitris Rizopoulos
if you also need to create the `keep' vector, then you could try this 
approach:

fam - c(1,2,3,3,4,4,4)
wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
dat - data.frame(fam, wt)
###
keep - unlist( lapply(split(wt, fam), function(x){
ind - rep(FALSE, length(x))
ind[which.max(x)] - TRUE
ind
}) )
as.numeric(keep)
dat[keep, ]


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: Chris Wallace [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Monday, August 01, 2005 4:24 PM
Subject: [R] converting stata's by syntax to R


I am struggling with migrating some stata code to R.  I have a data
 frame containing, sometimes, repeat observations (rows) of the same
 family.  I want to keep only one observation per family, selecting
 that observation according to some other variable.  An example data
 frame is:

 # construct example data
 fam - c(1,2,3,3,4,4,4)
 wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
 keep - c(1,1,1,0,1,0,0)
 dat - as.data.frame(cbind(fam,wt,keep))
 dat

 I want to keep the observation for which wt is a maximum, and where
 this doesn't identify a unique observation, to keep just one anyway,
 not caring which.  Those observations are indicated above by 
 keep==1.
 (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
 c(1,1,1,0,0,0,1)).

 The stata code I would use is
 bys fam (wt): keep if _n==_N

 This is my (long-winded) attempt in R:

 # first keep those rows where wt=max_fam(wt)
 maxwt - by(dat,dat$fam,function(x) max(x[,2]))
 maxwt - sapply(maxwt,[[,1)
 maxwt.dat - 
 data.frame(maxwt=maxwt,fam=as.integer(names(maxwt)))
 dat - merge(dat,maxwt.dat)
 dat - dat[dat$wt==dat$maxwt,]
 dat

 Now I am stuck - I want to keep either row with fam==4, and have 
 tried
 playing around with combinations of sample and apply or by, but with
 no success.  I can only find an inefficient for-loop solution:

 # identify those rows with 1 observation
 more - by(dat,dat$fam,function(x) dim(x)[1])
 more - sapply(more,[[,1)
 more.dat - data.frame(more=more,fam=as.integer(names(more)))
 dat - merge(dat,more.dat)

 # sample from those for whom more1
 result-dat[dat$more==1,]
 for(f in unique(dat$fam[dat$more1])) {
  rows - rownames(dat[dat$fam==f,])
  result - rbind(result,dat[sample(rows,1),])
 }
 result

 I am sure that for something so simple in stata to be so complicated
 in R must indicate ignorance of R on my part, but searches of help
 files and RSiteSearch hasn't led to any better solution.

 Any suggestions would be most helpful! Thanks, C.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] converting stata's by syntax to R

2005-08-01 Thread Thomas Lumley
On Mon, 1 Aug 2005, Chris Wallace wrote:

 I am struggling with migrating some stata code to R.  I have a data
 frame containing, sometimes, repeat observations (rows) of the same
 family.  I want to keep only one observation per family, selecting
 that observation according to some other variable.  An example data
 frame is:

 # construct example data
 fam - c(1,2,3,3,4,4,4)
 wt - c(1,1,0.6,0.4,0.4,0.4,0.2)
 keep - c(1,1,1,0,1,0,0)
 dat - as.data.frame(cbind(fam,wt,keep))
 dat

 I want to keep the observation for which wt is a maximum, and where
 this doesn't identify a unique observation, to keep just one anyway,
 not caring which.  Those observations are indicated above by keep==1.
 (Note, keep - c(1,1,1,0,0,1,0) would be fine too, but not
 c(1,1,1,0,0,0,1)).

 The stata code I would use is
 bys fam (wt): keep if _n==_N

A reasonably direct translation of the Stata code is

   index - order(fam, -wt)
   keep - !duplicated(fam[index])
   dat - data.frame(fam=fam[index], wt=wt[index], keep=keep)

which sorts wt into decreasing order within family, then keeps the first 
observation in each family.

This is less general than solutions other people have given, but I'd 
expect it to be faster for large data sets. 'keep' ends up TRUE/FALSE 
rather than 1/0; if this is a problem use as.numeric() on it.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html