On Thu, Jan 27, 2011 at 11:23 PM, H Roark <hrbuil...@hotmail.com> wrote: > > I need to import a large number of simple, space-delimited text files with a > few columns of data each. The one quirk is that some rows are missing data > and some contain junk text at the end of each line. A typical file might look > like: > > a b c d > 1 2 3 x > 4 5 6 > 7 8 9 x > 1 2 3 x c c > 4 5 6 x > 7 8 9 x > > I'm trying to avoid having to pre-process the text files, as they all sit on > an ftp site that I don't manage. My initial approach was just to read the > files using a read.table() statement with the arguments flush and fill set to > TRUE. For example, to import the above text file I tried: > > read.table(file="ftp://ftp.example.dta", header=T, row.names=NULL, fill=T, > flush=T) > > However, R throws the error "more columns than column names" and won't import > the file. > > Interestingly, if I move the extra text "c c" from line 5 to line 6 in the > data file, read.table() reads the file just fine, and ignores the "c c". So, > my first question is, why does simply moving these data down a row solve this > problem? > > Next, I decided to try reading the file with the scan() function and it > worked perfectly: > > data.frame(scan(file="ftp://ftp.example.dta", what=list(a=0, b=0, c=0, d=""), > sep=" ", skip=1, flush=T, fill=T)) > > I'm new to R, but as I understand it read.table() is based on the scan() > function. This makes me wonder if there is an additional argument I can add > to read.table() to make it import the file successfully, as scan() was able > to do. Any help in this regard would be very much appreciated. I'd also > really like to hear folks' perspectives on the merits of scan() versus > read.table() (e.g. when is scan() the best option?). >
Read the header into nms and then the data into DF and then put them together: con <- file("myfile.dat") nms <- scan(con, what = "", nlines = 1) DF <- read.table(con, fill = TRUE) DF <- setNames(DF[seq_along(nms)], nms) or just read it twice: first the one line of the header and then the data: nms <- unlist(read.table("myfile.dat", nrows = 1)) DF <- read.table("myfile.dat", fill = TRUE, skip = 1) DF <- setNames(DF[seq_along(nms)], nms) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.