Milan Bouchet-Valat wrote > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider > quoted integers as an acceptable value for columns for which > colClasses="integer". But when colClasses is omitted, these columns are > read as integer anyway. > > For example, let's consider a file named file.dat, containing: > "1" > "2" > >> read.table("file.dat", colClasses="integer") > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > scan() expected 'an integer' and got '"1"'
Hi I just ran into a variation of this. I'm teaching myself agent based modelling from a book that uses NetLogo as the implementation language[1]. NetLogo has a feature called BehaviourSpaces that runs models over a varying range of parameter values and make arbitrary observations at each time step, which it then outputs to a CSV. One of the exercises involves plotting some graphs of a model run, but the output needs some processing before it can be graphed. Rather than hack away at the data by hand each time I run it, I decided to find a stats package to help, and I chose R. I'm a complete beginner to R, and I've been using the R in Action early access PDF as a guide[2]. I'm using R 3.1.0 GUI 1.64 Mavericks build (6734). The NetLogo CSV writer quotes all values, and mixes integers and floats. So a column of data might contain say (with the quotes actually in the file) "0", "1.25", "1", "2", "3.175". I tried importing the data like this: profit <- read.csv("BusinessInvestor1 Profit-table.csv", sep=",", header=TRUE, skip=6) But then some of the data is read in as factors: str(profit) 'data.frame': 1560 obs. of 9 variables: $ X.run.number. : int 8 6 2 7 5 1 3 4 6 8 ... $ restrict.sensing.radius : Factor w/ 1 level "false": 1 1 1 1 1 1 1 1 1 1 ... $ risk.multiplier : int 1 1 1 1 1 1 1 1 1 1 ... $ sensing.radius : int 1 1 1 1 1 1 1 1 1 1 ... $ profit.multiplier : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... $ X.step. : int 0 0 0 0 0 0 0 0 1 1 ... $ mean..wealth..of.turtles : Factor w/ 1501 levels "0","100038.136",..: 1 1 1 1 1 1 1 1 623 550 ... $ mean..profit..of.patches.with..any..turtles.here. : Factor w/ 1547 levels "2503.675","2582.275",..: 1 8 7 6 5 4 3 10 278 230 ... $ mean..failure.probability..of.patches.with..any..turtles.here.: Factor w/ 1558 levels "0.026069451281579437",..: 1504 1528 1508 1518 1516 1514 1512 1536 1321 1471 ... (For reasons I don't understand, the profit.multiplier parameter – which runs "0.5", "0.6", …, "1" – is imported as a numeric, whereas the observation values get turned into factors.) I read about colClasses but this trips over the "quoted integers aren't integers" bug: profit <- read.csv("BusinessInvestor1 Profit-table.csv", sep=",", header=TRUE, skip=6, colClasses=c("integer", "logical", "numeric", "numeric", "numeric", "integer", "numeric", "numeric", "numeric")) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '"8"' I created a little script to import the data, and use some casts to clean it up. At first I thought it was working, until I realised this line (to process CSV data in the range 0...1): profit$mean_failure_probability_of_inhabited <- as.numeric(profit$mean_failure_probability_of_inhabited) Was producing crazy values: str(profit) 'data.frame': 1560 obs. of 9 variables: ... $ mean_failure_probability_of_inhabited: num 1504 1528 1508 1518 1516 ... Eventually I figured out to do this (although I haven't yet figured out why): profit$mean_failure_probability_of_inhabited <- as.numeric(as.character(profit$mean_failure_probability_of_inhabited)) Anyway, for a beginner coming to R, this is all REALLY confusing, and it's taken me several hours to get my head round it. Although after reading about it a bit I can see the implementation issues causing this behaviour, as a noob it just feels like "R can't import CSV data". The most baffling thing is how telling R what format the data in each column is in actually *reduces* its ability to read the file! (For a while I thought it was complaining because "8" is an integer, not a real, but now I see it's because it's seeing it as a string.) My understanding of the CSV was the same as Peter Meilstrup describes it later in the thread – that quotes in a CSV are to allow the delimiter character in a value, and don't imply anything about the type of the data (because CSVs are untyped). Googling the scan() error led to this mailing list thread so I thought I'd describe my experience. If there's a more intuitive way for read.csv / read.table to work it might save beginners like me a lot of head-scratching! Best regards Ash [1] http://www.amazon.com/dp/0691136742/ [2] http://www.manning.com/kabacoff2/ -- View this message in context: http://r.789695.n4.nabble.com/read-table-with-quoted-integers-tp4677249p4689530.html Sent from the R devel mailing list archive at Nabble.com. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel