See below. On Mon, Oct 5, 2009 at 6:50 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > Try this. First we read a line at a time into L except for the > header. Then we use strapply to match on the given pattern. It > passes the backreferences (the portions within parentheses in the > pattern) to the function (defined via a formula) whose implicit > arguments are x, y and z. That function returns two columns which are > in the required form so that in the next statement we convert one to > chron and the other to numeric. See R News 4/1 for more about dates > and times. > > library(gsubfn) # strapply > library(chron) # as.chron > Lines <- "DATETIME FREQ > 01/09/2009 59.036 > 01/09/2009 00:00:01 58.035 > 01/09/2009 00:00:02 53.035 > 01/09/2009 00:00:03 47.033 > 01/09/2009 00:00:04 52.03 > 01/09/2009 00:00:05 55.025" > L <- readLines(Lines)[-1]
This line should be L <- readLines(textConnection(Lines))[-1] > > pat <- "(../../....) (..:..:..){0,1} *([0-9.]+)" > s <- strapply(L, pat, ~ c(paste(x, y, "00:00:00"), z), simplify = rbind) > > fmt <- "%m/%d/%Y %H:%M:%S" > DF <- data.frame(Time = as.chron(s[,1], fmt), Freq = as.numeric(s[,2])) > > DF > > The final output looks like this: > >> DF > Time Freq > 1 (01/09/09 00:00:00) 59.036 > 2 (01/09/09 00:00:01) 58.035 > 3 (01/09/09 00:00:02) 53.035 > 4 (01/09/09 00:00:03) 47.033 > 5 (01/09/09 00:00:04) 52.030 > 6 (01/09/09 00:00:05) 55.025 > > If the times are unique you could consider making a zoo object out of > it by replacing the DF<- statement with: > > library(zoo) > z <- zoo(as.numeric(s[,2]), as.chron(s[,1], fmt)) > > See the three vignettes in the zoo package. > > > On Mon, Oct 5, 2009 at 5:14 PM, esp <davidgary...@gmail.com> wrote: >> >> Date-Time-Stamp input method to correctly interpret user-specific >> formats:coding is 90% there - based on exmple at >> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html >> ...anyone got the last 10% please? >> >> CONTEXT: >> >> Data is received where one of the columns is a datetimestamp. At midnight, >> the value represented as text in this column consists of just the date part, >> e.g. "01/09/2009". At other times, the value in the column contains both >> date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as >> an appropriate data type, where for example date arithmetic can be >> performed. As far as I can tell, the most appropriate such data type is >> POSIXct. The trick then is to read in the datetimestamps in the data as >> this type. >> >> PROBLEM: >> >> POSIXct defaults to a text representation almost but not quite like my >> received data. The main difference is that the POSIXct date part is in >> reverse order, e.g. "2009-09-01". It is possible to define a different >> format where date and time parts look like my data but when encountering >> datetimestamps where only the the date part is present (as in the case of my >> midnight data) then this is interpreted as NA i.e. undefined. >> >> SOLUTION (ALMOST): >> >> There is a workaround (based on example at >> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to >> define a class then read the data in as this class. For such a class it is >> possible to define a class method, in terms of a function, for translating a >> text (character string) representation into a value. In that function, one >> can use a conditional expression to treat midnight datetimestamps >> differently from those at other times of day. The example below does that. >> In order to apply this function over all of the datetimestamp values in the >> column, it is necessary to use something like R's 'sapply' function. >> >> SNAG: >> >> The function below implements this approach. A datetimestamp with only the >> date part, including leading zeroes, is always length 10 (characters). It >> correctly interprets the datetimestamp values, but unfortunately translates >> them into what appear to be numeric type. I am actually uncertain precisely >> what is happening, as I am very new to R and have most certainly stretched >> myself in writing this code. I think perhaps it returns a list and >> something associated with this aspect makes it "forget" the data type is >> POSIXct or at least how such a type should be displayed as text or what to >> do about it. >> >> PLEA: >> >> Please, can anyone give any help whatsoever, however tenuous? >> >> CODE, DATA & RESULTS: >> >> Function to Read required data, intended to make the datetime column of the >> data (example given further below) into POSIXct values: >> <<< >> spot_frequency_readin <- function(file,nrows=-1) { >> >> # create temp class >> setClass("t_class2_", representation("character")) >> setAs("character", "t_class2_", function(from) {sapply(from, function(x) { >> if (nchar(x)==10) { >> as.POSIXct(strptime(x,format="%d/%m/%Y")) >> } >> else { >> as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) >> } >> } >> ) >> } >> ) >> >> #(for format symbols, see "R Reference Card") >> >> # read the file (TSV) >> file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, >> as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", >> "numeric") ) >> >> # remove it now that we are done with it >> removeClass("t_class2_") >> >> return(file) >> } >>>>> >> This appears to work apart as regards processing each row of data correctly, >> but the values returned look like numeric equivalents of POSIXct, as opposed >> to the expected character-based (string) equivalents: >> >> >> Example Data: >> <<< >> DATETIME FREQ >> 01/09/2009 59.036 >> 01/09/2009 00:00:01 58.035 >> 01/09/2009 00:00:02 53.035 >> 01/09/2009 00:00:03 47.033 >> 01/09/2009 00:00:04 52.03 >> 01/09/2009 00:00:05 55.025 >>>>> >> >> >> Example Function Call: >> <<< >>> spot = spot_frequency_readin("mydatafile.txt",4) >>>>> >> >> >> Result of Example Function Call: >> <<< >>> spot[1] >> DATETIME >> >> 1 1251759600 >> 2 1251759601 >> 3 1251759602 >> 4 1251759603 >>>>> >> >> >> What I ideally wanted to see (whether or not the time part of the >> datetimestamp at midnight was displayed): >> <<< >>> spot[1] >> DATETIME >> >> 01/09/2009 00:00:00 >> 01/09/2009 00:00:01 >> 01/09/2009 00:00:02 >> 01/09/2009 00:00:03 >> 01/09/2009 00:00:04 >>>>> >> >> >> For the function as defined above using 'sapply' >>> spot[,1] >> 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 >> 00:00:03 >> 1251759600 1251759601 1251759602 >> 1251759603 >> >> This was unexpected - it seems to have displayed the datetimestamp values >> both as per my defined character-string representation and as numeric >> values. >> >> Alternatively ifI replace the 'sapply' by a 'lapply' then I get something >> closer to what I expect. It is at least what looks like R's default text >> representation for POSIXct datetimes, even if it is not in my preferred >> format. >> <<< >>> spot[,1] >> >> [[1]] >> [1] "2009-09-01 BST" >> >> [[2]] >> [1] "2009-09-01 00:00:01 BST" >> >> [[3]] >> [1] "2009-09-01 00:00:02 BST" >> >> [[4]] >> [1] "2009-09-01 00:00:03 BST" >>>>> >> >> -- >> View this message in context: >> http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.