On Sun, 2006-10-29 at 12:18 -0500, Gabor Grothendieck wrote: > On 10/29/06, Marc Schwartz <[EMAIL PROTECTED]> wrote: > > On Sun, 2006-10-29 at 10:31 -0600, tom soyer wrote: > > > Hi, > > > > > > I noticed that as.Date() could not convert date string to date type if the > > > dates are very old. For example, if the date string is "1-Mar-50", then > > > as.Date() would convert this to "2050-03-01", NOT "1950-03-01". This seems > > > to be the behavior of as.Date() for dates older than 1969-1-1, and it is > > > not > > > documented in the R as.Date() documentation. It seems very strange that R > > > would fail to convert old dates correctly. Does anyone know if this is the > > > correct behavior? If so, then which method should one use to convert old > > > dates? > > > > > > Thanks, > > > > > > Tom > > > > > > P.S., I am using R 2.4.0 for Windows. > > > > This is covered in ?strftime, which is also noted in the "See Also" > > for ?as.Date, where it says: > > > > "Your system's help pages on strftime and strptime to see how to specify > > their formats." > > > > In this case, the former help page in R indicates: > > > > %y > > Year without century (00–99). If you use this on input, which > > century you get is system-specific. So don't! Often values up to > > 69 (or 68) are prefixed by 20 and 70(or 69) to 99 by 19. > > > > > > Thus on FC5 Linux, I get: > > > > > as.Date("1-Mar-50", format = "%d-%b-%y") > > [1] "2050-03-01" > > > > > > Ideally, you should change the representation of the Year component of > > the dates you are working with to show a full four digit year and then > > use (note %Y (capital 'Y') instead of %y): > > > > > as.Date("1-Mar-1950", format = "%d-%b-%Y") > > [1] "1950-03-01" > > > > If this data was exported from another data source (ie. Excel) change > > the format in that program prior to exporting. > > > > Otherwise, you could do something like this in R using sub(): > > > > > sub("-([0-9]+)$", "-19\\1", "1-Mar-50") > > [1] "1-Mar-1950" > > > > Which will change the two digit year ('50') to a four digit year > > ('1950'). See ?sub and ?regexp for more information. > > > > HTH, > > > > Marc Schwartz > > As mentioned in the Help Desk article of Rnews 4-1, chron uses > the chron.year.expand option with a default of year.expand to do > the conversion from 2 digit to 4 digit. year.expand has a > default cutoff of 30, i.e. years after 30 are regarded to be 19xx > and ones before 30 are 20xx. Thus if that cutoff is ok for you: > > library(chron) > as.Date(chron("1-Mar-50", format = "day-month-year")) > > If the cutoff of 30 is ok then we have a solution. Its possible to change > that in chron although as discussed in the article but as mentioned > it is not recommended that you change the chron.* options since it > might interfere with making your software interoperable with other software. > > You could also do the 2 to 4 digit conversion yourself as suggested by Marc > or using gsubfn like this (where this example uses a cutoff of 10): > > library(gsubfn) > gsubfn("..$", ~ as.numeric(x) + 100*(as.numeric(x) < 10) + 1900, "1-Mar-50") > > This matches the last two digits and then adds 100+1900 if year <10 > or adds 1900 if year is greater replacing those digits with the new 4 digit > number. Then we can convert the output of gsubfn using as.Date > unambiguously with the appropriate format argument.
I would just add further, to recall that the use of two digits years was one of the key issues (among others) surrounding "Y2K" (http://en.wikipedia.org/wiki/Y2K) and that from an operational standpoint, representing years in this fashion should be avoided at all costs. Hence the "So don't!" in the help page that I quoted above. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.