[ Arrggh, not reply , but reply to all , cross my fingers again , sorry Peter! ]
Hmm, I don't think you need a retain statement. if first.patientID ; or if last.patientID ; ought to do it. It's actually better than the Vilno version, I must admit, a bit more concise: if ( not firstrow(patientID) ) deleterow ; Ah well. ********************************** For the folks asking for location of software ( I know posted it, but it didn't connect to the thread, and you get a huge number of posts each day , sorry): Vilno , find at http://code.google.com/p/vilno DAP & PSPP, find at http://directory.fsf.org/math/stats Awk, find at lots of places, http://www.gnu.org/software/gawk/gawk.html Anything else? DAP & PSPP are hard to find, I'm sure there's more out there! What about MDX? Nahh, not really the right problem domain. Nobody uses MDX for this stuff. ****************************************************** If my examples , using clinical trial data are boring and hard to understand for those who asked for examples ( and presumably don't work in clinical trials) , let me know. Some of these other examples I'm reading about are quite interesting. It doesn't help that clinical trial databases cannot be public. Making a fake database would take a lot of time. The irony is , even with my deep understanding of data preparation in clinical trials, the pharmas still don't want to give me a job ( because I was gone for many years). ******************************************************** Let's see if this post works : thanks to the folks who gave me advice on how to properly respond to a post within a thread . ( Although the thread in my gmail account is only a subset of the posts visible in the archives ). Crossing my fingers .... On 6/10/07, Peter Dalgaard <[EMAIL PROTECTED]> wrote: > Douglas Bates wrote: > > Frank Harrell indicated that it is possible to do a lot of difficult > > data transformation within R itself if you try hard enough but that > > sometimes means working against the S language and its "whole object" > > view to accomplish what you want and it can require knowledge of > > subtle aspects of the S language. > > > Actually, I think Frank's point was subtly different: It is *because* of > the differences in view that it sometimes seems difficult to find the > way to do something in R that is apparently straightforward in SAS. > I.e. the solutions exist and are often elegant, but may require some > lateral thinking. > > Case in point: Finding the first or the last observation for each > subject when there are multiple records for each subject. The SAS way > would be a datastep with IF-THEN-DELETE, and a RETAIN statement so that > you can compare the subject ID with the one from the previous record, > working with data that are sorted appropriately. > > You can do the same thing in R with a for loop, but there are better > ways e.g. > subset(df,!duplicated(ID)), and subset(df, rev(!duplicated(rev(ID))), or > maybe > do.call("rbind",lapply(split(df,df$ID), head, 1)), resp. tail. Or > something involving aggregate(). (The latter approaches generalize > better to other within-subject functionals like cumulative doses, etc.). > > The hardest cases that I know of are the ones where you need to turn one > record into many, such as occurs in survival analysis with > time-dependent, piecewise constant covariates. This may require > "transposing the problem", i.e. for each interval you find out which > subjects contribute and with what, whereas the SAS way would be a > within-subject loop over intervals containing an OUTPUT statement. > > Also, there are some really weird data formats, where e.g. the input > format is different in different records. Back in the 80's where > punched-card input was still common, it was quite popular to have one > card with background information on a patient plus several cards > detailing visits, and you'd get a stack of cards containing both kinds. > In R you would most likely split on the card type using grep() and then > read the two kinds separately and merge() them later. > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
