This is great. I haven't seen "as.Date" function before, and was using "as.date" from library(date). (note the lowercase 'd')
I have an alternative which might or might not be faster... If the date is formatted "yyyymmdd" (e.g. 20070719) library(date) formatted <- gsub("^(\\d{4})(\\d{2})(\\d{2})$", "\\2-\\3-\\1", d$yyyymmdd, perl=TRUE) d$dates <- as.date(formatted) Since as.date only accepts certain type of date formats, I had to use gsub to reshuffle the date substrings around. as.Date returns the objects of class "Date", whereas as.date returns the objects of class "date". Not sure what the differences are, but a simple test below shows that as.date conversion is slightly faster, given a character vector of 10000 date entries. FYI, I ran a quick performance comparison test on a 64bit linux machine on 2.6.9 kernel. The test is very rudimentary, but hopefully useful... I have two scripts: ###################################### as.date_test.R ############################################## library(date) d <- read.table("/tmp/dates", as.is = TRUE, col.names = c("yyyymmdd"), colClasses = c("character")) formatted <- gsub("^(\\d{4})(\\d{2})(\\d{2})$", "\\2-\\3-\\1", d$yyyymmdd, perl=TRUE) d$dates <- as.date(formatted) print(nrow(d)) print(d$dates[1:3]) ###################################################################################################### ###################################### as.Date_test.R ############################################## d <- read.table("/tmp/dates", as.is = TRUE, col.names = c("yyyymmdd"), colClasses = c("character")) d$dates <- as.Date(d$yyyymmdd, format = "%Y%m%d") print(nrow(d)) print(d$dates[1:3]) ###################################################################################################### Both scripts read in the same text file containing 10000 date strings, and then convert them into appropriate date objects. # 10000 date records in a flat file <[EMAIL PROTECTED]>$ wc -l /tmp/dates 10000 /tmp/dates # just to illustrate what the dates look like <[EMAIL PROTECTED]>$ head -2 /tmp/dates 19900817 19900820 # Running the test script 5 times each <[EMAIL PROTECTED]>$ for i in 1 2 3 4 5; do time R --vanilla < as.date_test.R > /dev/null; done real 0m1.29s user 0m1.23s sys 0m0.05s real 0m1.28s user 0m1.23s sys 0m0.06s real 0m1.28s user 0m1.22s sys 0m0.06s real 0m1.29s user 0m1.22s sys 0m0.06s real 0m1.28s user 0m1.21s sys 0m0.07s <[EMAIL PROTECTED]>$ for i in 1 2 3 4 5; do time R --vanilla < as.Date_test.R > /dev/null; done real 0m1.65s user 0m0.99s sys 0m0.64s real 0m1.64s user 0m0.98s sys 0m0.66s real 0m1.63s user 0m0.98s sys 0m0.65s real 0m1.64s user 0m1.00s sys 0m0.64s real 0m1.64s user 0m0.98s sys 0m0.65s Notice that as.date conversion is silghtly faster than as.Date conversion, on average... Just thought it was interesting to share. (and thanks Mark Leeds for reference) Regards, JB On 07/18/07 16:13:49, Gavin Simpson wrote: > On Wed, 2007-07-18 at 12:14 -0700, Mr Natural wrote: > > Proper calendar dates in R are great for plotting and calculating. > > However for the non-wonks among us, they can be very frustrating. > > I have recently discussed the pains that people in my lab have had > > with dates in R. Especially the frustration of bringing date data into R > > from Excel, which we have to do a lot. > > I've always found the following reasonably intuitive: > > Given the csv file that I've pasted in below, the following reads the > csv file in, formats the dates and class Date and then draws a plot. > > I have dates in DD/MM/YYYY format so year is not first - thus attesting > to R not hating dates in this format ;-) > > ## read in csv data > ## as.is = TRUE stops characters being converted to factors > ## thus saving us an extra step to convert them back > dat <- read.csv("date_data.csv", as.is = TRUE) > > ## we convert to class Date > ## format tells R how the dates are formatted in our character strings > ## see ?strftime for the meaning and available codes > dat$Date <- as.Date(dat$Date, format = "%d/%m/%Y") > > ## check this worked ok > str(dat$Date) > dat$Date > > ## see nicely formatted dates and not a drop of R-related hatred > ## but just about the most boring graph I could come up with > plot(Data ~ Date, dat, type = "l") > > And you can keep your Excel file formatted as dates as well - bonus! > > Oh, and before you get "Martin'd", it is the chron *package*! > > HTH > > G > > CSV file I used, generated in OpenOffice.org, but I presume it stores > Dates in the same way as Excel?: > > "Data","Date" > 1,01/01/2007 > 2,02/01/2007 > 3,03/01/2007 > 4,04/01/2007 > 5,05/01/2007 > 6,06/01/2007 > 7,07/01/2007 > 8,08/01/2007 > 9,09/01/2007 > 10,10/01/2007 > 11,11/01/2007 > 10,12/01/2007 > 9,13/01/2007 > 8,14/01/2007 > 7,15/01/2007 > 6,16/01/2007 > 5,17/01/2007 > 4,18/01/2007 > 3,19/01/2007 > 2,20/01/2007 > 1,21/01/2007 > 1,22/01/2007 > 2,23/01/2007 > 3,24/01/2007 > > > Please find below a simple analgesic for R date importation that I > > discovered > > over the last 1.5 days (Learning new stuff in R is calculated in 1/2 days). > > > > The function dates() gives the simplest way to get calendar dates into > > R from Excel that I can find. > > But straight importation of Excel dates, via a csv or txt file, can be a a > > huge pain (I'll give details for anyone who cares to know). > > > > My pain killer is: > > Consider that you have Excel columns in month, day, year format. Note that R > > hates date data that does not lead with the year. > > > > a. Load the chron library by typing library(chron) in the console. > > You know that you need this library from information revealed by > > performing the query, > > ?dates()" in the Console window. This gives the R documentation > > help file for this and related time, date functions. In the upper left > > of the documentation, one sees "dates(chron)". This tells you that you > > need the library chron. > > > > b. Change the format "dates" in Excel to format "general", which gives > > 5 digit Julian dates. Import the csv file (I use read.csv() with the > > Julian dates and other data of interest. > > > > c. Now, change the Julian dates that came in with the csv file into > > calendar dates with the dates() function. Below is my code for > > performing > > this activity, concerning an R data file called ss, > > > > ss holds the Julian dates, illustrated below from the column MPdate, > > > > >ss$MPdate[1:5] > > [1] 34252 34425 34547 34759 34773 > > > > The dates() function makes calendar dates from Julian dates, > > > > >dmp<-dates(ss$MPdate,origin=c(month = 1, day = 1, year = 1900)) > > > > > dmp[1:5] > > [1] 10/12/93 04/03/94 08/03/94 03/03/95 03/17/95 > > > > I would appreciate the comments of more sophisticated programmers who > > can suggest streamlining or shortcutting this operation. > > > > regards, Don > > > > > > > > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- JB Kim Morgan Stanley, METL 1585 - 9th Floor New York, NY 10036 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.