On May 18, 2009, at 11:24 AM, Steve Murray wrote:
Dear all,
I have a file which I've converted from NetCDF (.nc) to text (.txt)
using ncdump in Unix (as I had problems using the ncdf package to do
this). The first few rows (as copied and pasted from the Unix
console) of the file appear as follows:
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _,
As you can see, there are a lot of NA values before the actual
numeric values start further down the dataset. My problem is that
I'm having trouble reading this file into R. I think the problem
lies with the sep= argument, although I may be wrong. I tried the
following command at first, as the data appear to be comma separated:
read.table("test86.txt", skip=43, na.strings="-", header=FALSE,
sep=",") -> test86 # skip =43 due to meta-data information being
held in the initial rows
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
line 29 did not have 25 elements
I then tried sep=" ", followed by sep="" but received a similar-type
error message (although line 29 doesn't appear to be especially
different from the rest).
I subsequently tried using sep=\t and then sep=\n. These both result
in the data being read in without an error message being displayed,
although the data are formatted as follows:
head(test86)
V1
1 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
2 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
3 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
4 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
5 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
6 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _,
dim(test86)
[1] 179899 1
Instead of one column, I'd expect there to be 720.
I think I'm getting something wrong relating to the sep= argument
(or possibly mis-using na.strings?). If anyone has any solutions to
this then I'd be very grateful to hear them.
Many thanks for any advice,
Steve
Two problems,
1. Your first line above has one more column/entry than the subsequent
lines. If that is correct, you need to use the 'fill = TRUE' argument
so that all subsequent rows are filled to have the same number of
columns. If the above is due to a copy/paste error, then disregard this.
2. You are using a '-' (hyphen) as your 'na.strings' character, when
the data is using a '_' (underscore).
Additionally, I would use 'strip.white = TRUE', to aid in getting rid
of extraneous white space around your fields/separators. That will
also help with column separations.
Thus (on OSX) with the above data copied to the clipboard:
> read.table(pipe("pbpaste"), na.strings = "_", sep = ",", fill =
TRUE, strip.white = TRUE)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
V20 V21 V22 V23 V24 V25 V26
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
HTH,
Marc Schwartz
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posti