On 2011-01-27 20:23, H Roark wrote:

I need to import a large number of simple, space-delimited text files with a 
few columns of data each. The one quirk is that some rows are missing data and 
some contain junk text at the end of each line. A typical file might look like:

a b c d
1 2 3 x
4 5 6
7 8 9 x
1 2 3 x c c
4 5 6 x
7 8 9 x

I'm trying to avoid having to pre-process the text files, as they all sit on an 
ftp site that I don't manage.  My initial approach was just to read the files 
using a read.table() statement with the arguments flush and fill set to TRUE. 
For example, to import the above text file I tried:

read.table(file="ftp://ftp.example.dta";, header=T, row.names=NULL, fill=T, 
flush=T)

However, R throws the error "more columns than column names" and won't import 
the file.

Interestingly, if I move the extra text "c c" from line 5 to line 6 in the data file, 
read.table() reads the file just fine, and ignores the "c c".  So, my first question is, 
why does simply moving these data down a row solve this problem?


Note this comment in the Details section of ?read.table:

   "The number of data columns is determined by looking
    at the first five lines of input ..."

Peter Ehlers

Next, I decided to try reading the file with the scan() function and it worked 
perfectly:

data.frame(scan(file="ftp://ftp.example.dta";, what=list(a=0, b=0, c=0, d=""), sep=" 
", skip=1, flush=T, fill=T))

I'm new to R, but as I understand it read.table() is based on the scan() 
function. This makes me wonder if there is an additional argument I can add to 
read.table() to make it import the file successfully, as scan() was able to do. 
 Any help in this regard would be very much appreciated.  I'd also really like 
to hear folks' perspectives on the merits of scan() versus read.table() (e.g. 
when is scan() the best option?).

Cheers
                                        
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to