Try this. First we read the raw lines into R using grep to remove any lines containing a character that is not a number or space. Then we look for the year lines and repeat them down V1 using cumsum. Finally we omit the year lines.
myURL <- "http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat" raw.lines <- readLines(myURL) DF <- read.table(textConnection(raw.lines[!grepl("[^ 0-9.]",raw.lines)]), fill = TRUE) DF$V1 <- DF[cumsum(is.na(DF[[2]])), 1] DF <- na.omit(DF) head(DF) On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote <tim+r-project....@coote.org> wrote: > Hullo > I'm trying to read some time series data of meteorological records that are > available on the web (eg > http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat). I'd > like to be able to read in the digital data directly into R. However, I > cannot work out the right function and set of parameters to use. It could > be that the only practical route is to write a parser, possibly in some > other language, reformat the files and then read these into R. As far as I > can tell, the informal grammar of the file is: > > <comments terminated by a blank line> > [<year number on a line on its own> > <daily readings lines> ]+ > > and the <daily readings> are of the form: > <whitespace> <day number> [<whitespace> <reading on day of month>] 12 > > Readings for days in months where a day does not exist have special values. > Missing values have a different special value. > > And then I've got the problem of iterating over all relevant files to get a > whole timeseries. > > Is there a way to read in this type of file into R? I've read all of the > examples that I can find, but cannot work out how to do it. I don't think > that read.table can handle the separate sections of data representing each > year. read.ftable maybe can be coerced to parse the data, but I cannot see > how after reading the documentation and experimenting with the parameters. > > I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10. > > Any help/suggestions would be greatly appreciated. I can see that this type > of issue is likely to grow in importance, and I'd also like to give the data > owners suggestions on how to reformat their data so that it is easier to > consume by machines, while being easy to read for humans. > > The early records are a serious machine parsing challenge as they are tiff > images of old notebooks ;-) > > tia > > Tim > Tim Coote > t...@coote.org > vincit veritas > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.