Mark: Thanx for the pointers. As suggested I will explore scan() method. Andy: How can I use colClasses in my case. I tried it unsuccessfully. Encountering following error. coltypes<- c("numeric","factor","numeric","numeric","numeric","numeric","factor", "numeric","numeric","factor","factor","numeric","numeric","numeric","n "numeric","numeric","numeric","numeric")
mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = coltypes, strip.white=TRUE) ERROR: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got 'V1' Thank again. Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: Much easier to use colClasses in read.table, and in many cases just as fast (or even faster). Andy From: Mark Stephens > > From ?scan: "the *type* of what gives the type of data to be > read". So list(integer(), integer(), double(), raw(), ...) In > your code all columns are being read as character regardless > of the contents of the character vector. > > I have to admit that I have added the *'s in *type*. I have > been caught out by this too. Its not the most convenient way > to specify the types of a large number of columns either. As > you have a lot of columns you might want to do something like > this: as.list(rep(integer(1),250)), assuming your dummies > are together, to save typing. Also storage.mode() is useful > to tell you the precise type (and therefore size) of an > object e.g. sapply(coltypes, > storage.mode) is actually the types scan() will use. Note > that 'numeric' could be 'double' or 'integer' which are > important in your case to fit inside the 1GB limit, because > 'integer' (4 bytes) is half 'double' (8 bytes). > > Perhaps someone on r-devel could enhance the documentation to > make "type" stand out in capitals in bold in help(scan)? Or > maybe scan could be clever enough to accept a character > vector 'what'. Or maybe I'm missing a good reason why this > isn't possible - anyone? How about allowing a character > vector length one, with each character representing the type > of that column e.g. what="IIIIDDCD" would mean 4 integers > followed by 2 double's followed by a character column, > followed finally by a double column, 8 columns in total. > Probably someone somewhere has done that already, but I'm not > aware anyone has wrapped it up conveniently? > > On 25/04/06, Sachin J wrote: > > > > Mark: > > > > Here is the information I didn't provide in my earlier > post. R version > > is R2.2.1 running on Windows XP. My dataset has 16 variables with > > following data type. > > ColNumber: 1 2 3 .......16 > > Datatypes: > > > > > "numeric","numeric","numeric","numeric","numeric","numeric","character > > > ","numeric","numeric","character","character","numeric","numeric","num > > eric","numeric","numeric","numeric","numeric" > > > > Variable (2) which is numeric and variables denoted as > character are > > to be treated as dummy variables in the regression. > > > > Search in R help list suggested I can use read.csv with colClasses > > option also instead of using scan() and then converting it to > > dataframe as you suggested. I am trying both these methods > but unable > > to resolve syntactical error. > > > > >coltypes<- > > > c("numeric","factor","numeric","numeric","numeric","numeric","factor", > > > "numeric","numeric","factor","factor","numeric","numeric","numeric","n > > umeric","numeric","numeric","numeric") > > > > >mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = > > >coltypes, > > strip.white=TRUE) > > > > ERROR: Error in scan(file = file, what = what, sep = sep, quote = > > quote, dec = dec, : > > scan() expected 'a real', got 'V1' > > > > No idea whats the problem. > > > > AS PER YOUR SUGGESTION I TRIED scan() as follows: > > > > > > > >coltypes<-c("numeric","factor","numeric","numeric","numeric","numeric > > > >","factor","numeric","numeric","factor","factor","numeric","n > umeric","numeric","numeric","numeric","numeric","numeric") > > >x<-scan(file = > "C:/temp/data.dbf",what=as.list(coltypes),sep=",",quiet=TRUE,skip=1) > > > > >names(x)<-scan(file = "C:/temp/data.dbf",what="",nlines=1, sep=",") > > >x<-as.data.frame(x) > > > > This is working fine but x has no data in it and contains > > > x > > > > [1] X._. NA. NA..1 NA..2 NA..3 NA..4 NA..5 NA..6 > NA..7 NA..8 > > NA..9 NA..10 NA..11 > > [14] NA..12 NA..13 NA..14 NA..15 NA..16 > > <0 rows> (or 0-length row.names) > > > > Please let me know how to properly use scan or colClasses option. > > > > Sachin > > > > > > > > > > > > *Mark Stephens * wrote: > > > > Sachin, > > With your dummies stored as integer, the size of your object would > > appear to be 350000 * (4*250 + 8*16) bytes = 376MB. You > said "PC" but > > did not provide R version information, assuming windows then ... > > With 1GB RAM you should be able to load a 376MB object into > memory. If you > > can store the dummies as 'raw' then object size is only 126MB. > > You don't say how you attempted to load the data. Assuming > your input data > > is in text file (or can be) have you tried scan()? Setup the 'what' > > argument > > with length 266 and make sure the dummy column are set to > integer() or > > raw(). Then x = scan(...); class(x)=" data.frame". > > What is the result of memory.limit()? If it is 256MB or > 512MB, then try > > starting R with --max-mem-size=800M (I forget the syntax > exactly). Leave a > > bit of room below 1GB. Once the object is in memory R may > need to copy it > > once, or a few times. You may need to close all other apps > in memory, or > > send them to swap. > > I don't really see why your data should not fit into the > memory you have. > > Purchasing an extra 1GB may help. Knowing the object size > calculation (as > > above) should help you guage whether it is worth it. > > Have you used process monitor to see the memory growing as > R loads the > > data? This can be useful. > > If all the above fails, then consider 64-bit and purchasing > as much memory > > as you can afford. R can use over 64GB RAM+ on 64bit > machines. Maybe you > > can > > hire some time on a 64-bit server farm - i heard its quite > cheap but never > > tried it myself. You shouldn't need to go that far with > this data set > > though. > > Hope this helps, > > Mark > > > > > > Hi Roger, > > > > I want to carry out regression analysis on this dataset. So > I believe > > I can't read the dataset in chunks. Any other solution? > > > > TIA > > Sachin > > > > > > roger koenker < [EMAIL PROTECTED]> wrote: > > You can read chunks of it at a time and store it in sparse > matrix form > > using the packages SparseM or Matrix, but then you need to > think about > > what you want to do with it.... least squares sorts of > things are ok, > > but other options are somewhat limited... > > > > > > url: www.econ.uiuc.edu/~roger Roger Koenker > > email [EMAIL PROTECTED] Department of Economics > > vox: 217-333-4558 University of Illinois > > fax: 217-244-6678 Champaign, IL 61820 > > > > > > On Apr 24, 2006, at 12:41 PM, Sachin J wrote: > > > > > Hi, > > > > > > I have a dataset consisting of 350,000 rows and 266 > columns. Out of > > > 266 columns 250 are dummy variable columns. I am trying > to read this > > > data set into R dataframe object but unable to do it due > to memory > > > size limitations (object size created is too large to > handle in R). > > > Is there a way to handle such a large dataset in R. > > > > > > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. > > > > > > Any pointers would be of great help. > > > > > > TIA > > > Sachin > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html> > osting-guide.html> > > > > > > ------------------------------ > calls. Great > > rates starting at 1ยข/min. > > > > > com/evt=39666/*http://beta.messenger.yahoo.com> > > > > > > [[alternative HTML version deleted]] > > ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------- [[alternative HTML version deleted]]
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html