Re: [R] Can R replicate this data manipulation in SAS?

Bert Gunter Thu, 21 Apr 2011 07:34:57 -0700

Folks:

It is perhaps worth noting that this is probably  a Type III error: right
answer to the wrong question. The right question would be: what data
structures and analysis strategy are appropriate in R? As usual, different
language architectures mean that different paradigms should be used to best
fit a language's strengths and weaknesses. Direct translations do not
necessarily do this.


Of course this takes experience and work ... as dictated by the "no free
lunch" principle of thermodynamics.

Cheers,
Bert

On Wed, Apr 20, 2011 at 2:53 PM, Ista Zahn <iz...@psych.rochester.edu>wrote:

> Oops, I missed the HAART part. Fortunately that translates
> straightforwardly:
>
> n.dat$HAART <- with(n.dat, ifelse((NRTI >= 3 & NNRTI==0 & PI==0) |
>                                  (NRTI >= 2 & (NNRTI >= 1 | PI >= 1)) |
>                                  (NRTI == 1 & NNRTI >= 1 & PI >= 1),
>                                  1, 0))
>
> Best,
> Ista
>
> On Wed, Apr 20, 2011 at 5:22 PM, Ista Zahn <iz...@psych.rochester.edu>
> wrote:
> > I think this is kind of like asking "will your Land Rover make it up
> > my driveway?", but I'll assume the question was asked in all
> > seriousness.
> >
> > Here is one solution:
> >
> > ## **** Read in test data;
> > dat <- read.table(textConnection("id    drug      start       stop
> > 1004    NRTI     07/24/95    01/05/99
> > 1004    NRTI     11/20/95 12/10/95
> > 1004    NRTI     01/10/96    01/05/99
> > 1004    PI       05/09/96    11/16/97
> > 1004    NRTI     06/01/96    02/01/97
> > 1004    NRTI     07/01/96    03/01/97
> > 9999    PI       01/02/03    NA
> > 9999    NNRTI    04/05/06    07/08/09"), header=TRUE)
> > closeAllConnections()
> >
> > dat$start <- as.Date(dat$start, format = "%m/%d/%y")
> > dat$stop <- as.Date(dat$stop, format = "%m/%d/%y")
> >
> > ## **** Reshape data into series with 1 date rather than separate starts
> and
> > ## stops;
> >
> > library(reshape)
> >
> > m.dat <- melt(dat, id = c("id", "drug"))
> > m.dat <- m.dat[order(m.dat$id, m.dat$value),]
> > m.dat$variable <- ifelse(m.dat$variable == "start", 1, -1)
> > names(m.dat) <-  c("id", "drug", "value", "date")
> > m.dat
> >
> > ## **** Get regimen information plus start and stop dates;
> >
> > n.dat <- cast(m.dat, id + date ~ drug, fun.aggregate=sum,
> margins="grand_col")
> > for (i in names(n.dat)[-c(1:2)]) {
> >     n.dat[i] <- cumsum(n.dat[i])
> >   }
> > n.dat <- ddply(n.dat, .(id), transform,
> >      regimen = 1:length(id))
> > n.dat
> >
> > ssd.dat <- ddply(n.dat, .(id), summarize,
> >                id = id[-1],
> >                regimen = regimen[-length(regimen)],
> >                 start_date = date[-length(date)],
> >                stop_date = date[-1])
> > ssd.dat
> >
> > ## **** Merge data to create regimens dataset;
> > all.dat <- merge(n.dat[-2], ssd.dat)
> > all.dat <- all.dat[order(all.dat$id, all.dat$regimen), c("id",
> > "start_date", "stop_date", "regimen", "NRTI", "NNRTI", "PI",
> > "X.all.")]
> > all.dat
> >
> >
> > Best,
> > Ista
> >
> >
> >
> > On Wed, Apr 20, 2011 at 2:59 PM, Ted Harding <ted.hard...@wlandres.net>
> wrote:
> >> [*** PLEASE NOTE: I am sending this message on behalf of
> >>  Paul Miller:
> >>  Paul Miller <pjmiller...@yahoo.com>
> >>  (to whom this message has also been copied). He has been
> >>  trying to send it, but it has never got through. Please
> >>  do  not reply to me, but either to the list and/or to Paul
> >>  at that address ***]
> >> ==========================================================
> >> Hello Everyone,
> >>
> >> I'm learning R and am trying to get a better sense of what it will and
> >> will not
> >> do. I'm hearing in some places that R may not be able to accomplish all
> >> of the
> >> data manipulation tasks that SAS can. In others, I'm hearing that R can
> do
> >> pretty much any data manipulation that SAS can but the way in which it
> >> does so
> >> is likely to be quite different.
> >>
> >> Below is some SAS syntax that that codes Highly Active Antiretroviral
> >> Therapy
> >> (HAART) regimens in HIV patients by retaining the values of variables.
> >> Interspersed between the bits of code are printouts of data sets that
> are
> >> created in the process of coding. I'm hoping this will come through
> >> clearly and
> >> that people will be able to see exactly what is being done. Basically,
> >> the code
> >> keeps track of how many drugs people are on and what types of drugs they
> >> are
> >> taking during specific periods of time and decides whether that
> >> constitutes
> >> HAART or not.
> >>
> >> To me, this is a pretty tricky data manipulation in SAS. Is there any
> way
> >> to
> >> get the equivalent result in R?
> >>
> >> Thanks,
> >>
> >> Paul
> >>
> >>
> >> **** SAS syntax for coding HAART in HIV patients;
> >> **** Read in test data;
> >>
> >> data haart;
> >> input id drug_class $ start_date :mmddyy. stop_date :mmddyy.;
> >> format start_date stop_date mmddyy8.;
> >> cards;
> >> 1004 NRTI  07/24/95 01/05/99
> >> 1004 NRTI  11/20/95 12/10/95
> >> 1004 NRTI  01/10/96 01/05/99
> >> 1004 PI    05/09/96 11/16/97
> >> 1004 NRTI  06/01/96 02/01/97
> >> 1004 NRTI  07/01/96 03/01/97
> >> 9999 PI    01/02/03 .
> >> 9999 NNRTI 04/05/06 07/08/09
> >> ;
> >> run;
> >>
> >> proc print data=haart;
> >> run;
> >>
> >>               drug_      start_       stop_
> >> Obs     id     class        date        date
> >> 1     1004    NRTI     07/24/95    01/05/99
> >> 2     1004    NRTI     11/20/95 12/10/95
> >> 3     1004    NRTI     01/10/96    01/05/99
> >> 4     1004    PI       05/09/96    11/16/97
> >> 5     1004    NRTI     06/01/96    02/01/97
> >> 6     1004    NRTI     07/01/96    03/01/97
> >> 7     9999    PI       01/02/03           .
> >> 8     9999    NNRTI    04/05/06    07/08/09
> >>
> >> **** Reshape data into series with 1 date rather than separate starts
> and
> >> stops;
> >>
> >> data changes (drop=start_date stop_date where=(not missing(date)));
> >> set haart;
> >> date = start_date;
> >> change =  1;
> >> output;
> >> date =  stop_date;
> >> change = -1;
> >> output;
> >> format date mmddyy10.;
> >> run;
> >>
> >> proc sort data=changes;
> >> by id date;
> >> run;
> >>
> >> proc print data=changes;
> >> run;
> >>
> >>               drug_
> >> Obs     id     class          date    change
> >>  1    1004    NRTI     07/24/1995       1
> >>  2    1004    NRTI     11/20/1995       1
> >>  3    1004    NRTI     12/10/1995      -1
> >>  4    1004    NRTI     01/10/1996       1
> >>  5    1004    PI       05/09/1996       1
> >>  6    1004    NRTI     06/01/1996       1
> >>  7    1004    NRTI     07/01/1996       1
> >>  8    1004    NRTI     02/01/1997      -1
> >>  9    1004    NRTI     03/01/1997      -1
> >> 10    1004    PI       11/16/1997      -1
> >> 11    1004    NRTI     01/05/1999      -1
> >> 12    1004    NRTI     01/05/1999      -1
> >> 13    9999    PI       01/02/2003       1
> >> 14    9999    NNRTI    04/05/2006       1
> >> 15    9999    NNRTI    07/08/2009      -1
> >>
> >> **** Get regimen information plus start and stop dates;
> >>
> >> data cumulative(drop=drug_class change stop_date)
> >>     stop_dates(keep=id regimen stop_date);
> >> set changes;
> >> by id date;
> >>
> >> if first.id then do;
> >>  regimen = 0;
> >>  NRTI = 0;
> >>  NNRTI = 0;
> >>  PI = 0;
> >> end;
> >>
> >> if drug_class = 'NNRTI' then NNRTI + change;
> >> else if drug_class = 'NRTI' then NRTI + change;
> >> else if drug_class = 'PI  ' then PI + change;
> >>
> >> if last.date then do;
> >>  stop_date = date - 1;
> >> if regimen then output stop_dates;
> >>   regimen + 1;
> >>  alldrugs = NNRTI + NRTI + PI;
> >>  HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
> >>    (NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
> >>    (NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
> >> output cumulative;
> >> end;
> >>
> >> format stop_date mmddyy10.;
> >> run;
> >>
> >> proc print data=cumulative;
> >> run;
> >> Obs     id           date    regimen    NRTI    NNRTI    PI    alldrugs
> >>  HAART
> >>  1    1004    07/24/1995        1        1       0       0        1
> >>   0
> >>  2    1004    11/20/1995        2        2       0       0        2
> >>   0
> >>  3    1004    12/10/1995        3        1       0       0        1
> >>   0
> >>  4    1004    01/10/1996        4        2       0       0        2
> >>   0
> >>  5    1004    05/09/1996        5        2       0       1        3
> >>   1
> >>  6    1004    06/01/1996        6        3       0       1        4
> >>   1
> >>  7    1004    07/01/1996        7        4       0       1        5
> >>   1
> >>  8    1004    02/01/1997        8        3       0       1        4
> >>   1
> >>  9    1004    03/01/1997        9        2       0       1        3
> >>   1
> >> 10    1004    11/16/1997       10        2       0       0        2
> >>  0
> >> 11    1004    01/05/1999       11        0       0       0        0
> >>  0
> >> 12    9999    01/02/2003        1        0       0       1        1
> >>  0
> >> 13    9999    04/05/2006        2        0       1       1        2
> >>  0
> >> 14    9999    07/08/2009        3        0       0       1        1
> >>  0
> >>
> >> proc print data=stop_dates;
> >> run;
> >>
> >> Obs     id     regimen     stop_date
> >>  1    1004        1      11/19/1995
> >>  2    1004        2      12/09/1995
> >>  3    1004        3      01/09/1996
> >>  4    1004        4      05/08/1996
> >>  5    1004        5      05/31/1996
> >>  6    1004        6      06/30/1996
> >>  7    1004        7      01/31/1997
> >>  8    1004        8      02/28/1997
> >>  9    1004        9      11/15/1997
> >> 10    1004       10      01/04/1999
> >> 11    9999        1      04/04/2006
> >> 12    9999        2      07/07/2009
> >>
> >> **** Merge data to create regimens dataset;
> >>
> >> data regimens;
> >> retain id start_date stop_date;
> >> merge cumulative(rename=(date=start_date)) stop_dates;
> >> by id regimen;
> >> if alldrugs;
> >> run;
> >>
> >> proc print data=regimens;
> >> run;
> >>
> >> Obs     id     start_date     stop_date    regimen    NRTI    NNRTI
>  PI
> >>
> >> alldrugs    HAART
> >>  1    1004    07/24/1995    11/19/1995        1        1       0       0
> >>
> >>  1         0
> >>  2    1004    11/20/1995    12/09/1995        2        2       0       0
> >>
> >>  2         0
> >>  3    1004    12/10/1995    01/09/1996        3        1       0       0
> >>
> >>  1         0
> >>  4    1004    01/10/1996    05/08/1996        4        2       0       0
> >>
> >>  2         0
> >>  5    1004    05/09/1996    05/31/1996        5        2       0       1
> >>
> >>  3         1
> >>  6    1004    06/01/1996    06/30/1996        6        3       0       1
> >>
> >>  4         1
> >>  7    1004    07/01/1996    01/31/1997        7        4       0       1
> >>
> >>  5         1
> >>  8    1004    02/01/1997    02/28/1997        8        3       0       1
> >>
> >>  4         1
> >>  9    1004    03/01/1997    11/15/1997        9        2       0       1
> >>
> >>  3         1
> >> 10    1004    11/16/1997    01/04/1999       10        2       0       0
> >>
> >> 2         0
> >> 11    9999    01/02/2003    04/04/2006        1        0       0       1
> >>
> >> 1         0
> >> 12    9999    04/05/2006    07/07/2009        2        0       1       1
> >>
> >> 2         0
> >> 13    9999    07/08/2009             .        3        0       0       1
> >>
> >> 1         0
> >>
> >> ==========================================================
> >>
> >> Paul Miller
> >> Paul Miller <pjmiller...@yahoo.com>
> >>
> >>
> >> --------------------------------------------------------------------
> >> E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
> >> Fax-to-email: +44 (0)870 094 0861
> >> Date: 20-Apr-11                                       Time: 19:59:21
> >> ------------------------------ XFMail ------------------------------
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> > --
> > Ista Zahn
> > Graduate student
> > University of Rochester
> > Department of Clinical and Social Psychology
> > http://yourpsyche.org
> >
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
"Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can R replicate this data manipulation in SAS?

Reply via email to