On Dec 13, 2012, at 5:01 PM, David Winsemius wrote: > > On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote: > >> Hi, >> >> I encountered the behavior, that the duplicated method for data.frames gives >> "false positives" if there are columns of class POSIXct with a clock shift >> from DST to standard time. >> >> time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60) >> time >> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" >> >> df <- data.frame(time, text="foo") >> duplicated(df) >> [1] FALSE TRUE > > In this instance >> >> This is because the timezone is lost after calling paste(): >> do.call(paste, c(df, sep = "\r")) > > I suspect the problem arise when 'paste' coerces to character: > > > as.character(time) > [1] "2012-10-28 02:00:00" "2012-10-28 02:00:00" > > I think that as.character might get missed since the 'paste' operation is > done internally. > > > as.character(time, usetz=TRUE) > [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
This would work as intended if you pre-processed the argument to duplicated with: > data.frame(lapply(df, as.character, usetz=TRUE) ) time text 1 2012-10-28 02:00:00 CEST foo 2 2012-10-28 02:00:00 CET foo > duplicated( data.frame(lapply(df, as.character, usetz=TRUE) ) ) [1] FALSE FALSE > > > -- > David. > > > [1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo" >> >> > >> I can't really figure out if this behavior is desired or not. If so, a short >> warning in ?duplicated could be helpful. It is mentioned how >> duplicated.data.frame() works, but I didn't find a hint to properly handle >> POSIXct-objects. > > There is no duplicated.POSIXct method >> >> My particular problem was to cast a data.frame like this one with cast() >> (which calls reshape1(), which calls duplicated()): >> >> df2 <- data.frame(time, time1=as.numeric(time), >> lab=rep(1:3, each=2), value=101:106, >> text=rep(c("foo", "bar"), each=3)) >> >> library(reshape2) >> >> Using the column of class POSIXct as a variable in the formula gives: >> cast(lab*time~text, data=df2, value="value") >> Aggregation requires fun.aggregate: length used as default >> lab time bar foo >> 1 1 2012-10-28 02:00:00 0 2 >> 2 2 2012-10-28 02:00:00 1 1 >> 3 3 2012-10-28 02:00:00 2 0 >> >> Converting to numeric, casting and converting back works as expected, >> although the timezone is not visible, because print.data.frame() calls >> format.POSIXct() with, usetz = FALSE: >> y <- cast(lab*time1~text, data=df2, value="value") >> y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1) >> >> Can anyone suggest a more elegant solution? >> >> Best, >> Tobias >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.