On Dec 3, 2014, at 2:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: > Hello, > > Two alternative approaches - mutate() vs. sapply() - were used to get the > desired results (i.e., creating a new column of the most recent date from 4 > dates ) with help from Arun and Mark on this forum. I now find that the two > data objects (created using two different approaches) are not identical > although results are exactly the same. > > identical(new1, new2) > [1] FALSE >
You should have examined the output from dput() on both objects. I think you will find that dplyr is adding new attributes. Notice the the "mutate()-ed" object now has this class: class = c("rowwise_df", "tbl_df", "tbl", "data.frame") Moral: Never rely on the the print representation. -- David. > Please see the reproducible example below. > > I don't understand why the code returns FALSE here. Any hints/comments will > be appreciated. > > Thanks, > > Pradip > > ############################################# reproducible example > ######################################## > library(dplyr) > # data object - description > > temp <- "id mrjdate cocdate inhdate haldate > 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 > 2 NA NA NA NA > 3 2009-10-24 NA 2011-10-13 NA > 4 2007-10-10 NA NA NA > 5 2006-09-01 2005-08-10 NA NA > 6 2007-09-04 2011-10-05 NA NA > 7 2005-10-25 NA NA 2011-11-04" > > # read the data object > > example.data <- read.table(textConnection(temp), > colClasses=c("character", "Date", "Date", "Date", "Date"), > > header=TRUE, as.is=TRUE > ) > > > # create a new column -dplyr solution (Acknowledgement: Arun) > > new1 <- example.data %>% > rowwise() %>% > mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, > na.rm=TRUE), > origin='1970-01-01')) > > # create a new column - Base R solution (Acknowlegement: Mark Sharp) > > new2 <- example.data > new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) { > if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', > 'haldate')])))) { > max_d <- NA > } else { > max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', > 'haldate')]), na.rm = TRUE) > } > max_d}), > origin = "1970-01-01") > > identical(new1, new2) > > # print records > > print (new1); print(new2) > > Pradip K. Muhuri > SAMHSA/CBHSQ > 1 Choke Cherry Road, Room 2-1071 > Rockville, MD 20857 > Tel: 240-276-1070 > Fax: 240-276-1260 > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ) > Sent: Sunday, November 09, 2014 6:11 AM > To: 'Mark Sharp' > Cc: r-help@r-project.org > Subject: Re: [R] Getting the most recent dates in a new column from dates in > four columns using the dplyr package (mutate verb) > > Hi Mark, > > Your code has also given me the results I expected. Thank you so much for > your help. > > Regards, > > Pradip > > Pradip K. Muhuri, PhD > SAMHSA/CBHSQ > 1 Choke Cherry Road, Room 2-1071 > Rockville, MD 20857 > Tel: 240-276-1070 > Fax: 240-276-1260 > > > -----Original Message----- > From: Mark Sharp [mailto:msh...@txbiomed.org] > Sent: Sunday, November 09, 2014 3:01 AM > To: Muhuri, Pradip (SAMHSA/CBHSQ) > Cc: r-help@r-project.org > Subject: Re: [R] Getting the most recent dates in a new column from dates in > four columns using the dplyr package (mutate verb) > > Pradip, > > mutate() works on the entire column as a vector so that you find the maximum > of the entire data set. > > I am almost certain there is some nice way to handle this, but the sapply() > function is a standard approach. > > max() does not want a dataframe thus the use of unlist(). > > Using your definition of data1: > > data3 <- data1 > data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) { > if (all(is.na(unlist(data1[row, -1])))) { > max_d <- NA > } else { > max_d <- max(unlist(data1[row, -1]), na.rm = TRUE) > } > max_d}), > origin = "1970-01-01") > > data3 > id mrjdate cocdate inhdate haldate oidflag > 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 > 2 2 <NA> <NA> <NA> <NA> <NA> > 3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13 > 4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10 > 5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01 > 6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05 > 7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04 > > > > R. Mark Sharp, Ph.D. > Director of Primate Records Database > Southwest National Primate Research Center Texas Biomedical Research > Institute P.O. Box 760549 San Antonio, TX 78245-0549 > Telephone: (210)258-9476 > e-mail: msh...@txbiomed.org > > > > > > NOTICE: This E-Mail (including attachments) is confidential and may be > legally privileged. It is covered by the Electronic Communications Privacy > Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are > hereby notified that any retention, dissemination, distribution or copying of > this communication is strictly prohibited. Please reply to the sender that > you have received this message in error, then delete it. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.