Re: [R] Read.table problems

2009-05-18 Thread Marc Schwartz

On May 18, 2009, at 11:24 AM, Steve Murray wrote:



Dear all,

I have a file which I've converted from NetCDF (.nc) to text (.txt)  
using ncdump in Unix (as I had problems using the ncdf package to do  
this). The first few rows (as copied and pasted from the Unix  
console) of the file appear as follows:


_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,
   _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _,



As you can see, there are a lot of NA values before the actual  
numeric values start further down the dataset. My problem is that  
I'm having trouble reading this file into R. I think the problem  
lies with the sep= argument, although I may be wrong. I tried the  
following command at first, as the data appear to be comma separated:


read.table("test86.txt", skip=43, na.strings="-", header=FALSE,  
sep=",") -> test86  # skip =43 due to meta-data information being  
held in the initial rows
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,  
na.strings,  :

 line 29 did not have 25 elements

I then tried sep=" ", followed by sep="" but received a similar-type  
error message (although line 29 doesn't appear to be especially  
different from the rest).


I subsequently tried using sep=\t and then sep=\n. These both result  
in the data being read in without an error message being displayed,  
although the data are formatted as follows:



head(test86)

   V1
1 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,
2 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,
3 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,
4 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,
5 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,
6 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
_, _, _,




dim(test86)

[1] 179899  1


Instead of one column, I'd expect there to be 720.


I think I'm getting something wrong relating to the sep= argument  
(or possibly mis-using na.strings?). If anyone has any solutions to  
this then I'd be very grateful to hear them.


Many thanks for any advice,

Steve



Two problems,

1. Your first line above has one more column/entry than the subsequent  
lines. If that is correct, you need to use the 'fill = TRUE' argument  
so that all subsequent rows are filled to have the same number of  
columns. If the above is due to a copy/paste error, then disregard this.


2. You are using a '-' (hyphen) as your 'na.strings' character, when  
the data is using a '_' (underscore).


Additionally, I would use 'strip.white = TRUE', to aid in getting rid  
of extraneous white space around your fields/separators. That will  
also help with column separations.



Thus (on OSX) with the above data copied to the clipboard:

> read.table(pipe("pbpaste"), na.strings = "_", sep = ",", fill =  
TRUE, strip.white = TRUE)
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19  
V20 V21 V22 V23 V24 V25 V26
1  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
2  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
3  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
4  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
5  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
6  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
7  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
8  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
9  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
10 NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA




HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posti

[R] Read.table problems

2009-05-18 Thread Steve Murray

Dear all,

I have a file which I've converted from NetCDF (.nc) to text (.txt) using 
ncdump in Unix (as I had problems using the ncdf package to do this). The first 
few rows (as copied and pasted from the Unix console) of the file appear as 
follows:

 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,


As you can see, there are a lot of NA values before the actual numeric values 
start further down the dataset. My problem is that I'm having trouble reading 
this file into R. I think the problem lies with the sep= argument, although I 
may be wrong. I tried the following command at first, as the data appear to be 
comma separated:

> read.table("test86.txt", skip=43, na.strings="-", header=FALSE, sep=",") -> 
> test86  # skip =43 due to meta-data information being held in the initial rows
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 29 did not have 25 elements

I then tried sep=" ", followed by sep="" but received a similar-type error 
message (although line 29 doesn't appear to be especially different from the 
rest).

I subsequently tried using sep=\t and then sep=\n. These both result in the 
data being read in without an error message being displayed, although the data 
are formatted as follows:

> head(test86)
V1
1 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
2 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
3 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
4 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
5 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
6 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 


> dim(test86)
[1] 179899  1


Instead of one column, I'd expect there to be 720.


I think I'm getting something wrong relating to the sep= argument (or possibly 
mis-using na.strings?). If anyone has any solutions to this then I'd be very 
grateful to hear them.

Many thanks for any advice,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.