Dear all, Thank you very much for your replies and help. I will try to work with your suggestions and come back to you if I need something more.
Kind regards, Stella Pachidi On Thu, Apr 22, 2010 at 5:30 AM, kMan <kchambe...@gmail.com> wrote: > You set records to NULL perhaps (delete, shift up). Perhaps your system is > susceptible to butterflies on the other side of the world. > > Your code may have 'worked' on a small section of data, but the data used > did not include all of the cases needed to fully test your code. So... test > your code! > > scan(), used with 'nlines', 'skip', 'sep', and 'what' will cut your read > time by at least half while taking less RAM memory to do it, do most of your > post processing, and give you something to better test your code. Or, don't > use 'nlines' and lose your time/memory benefits over read.table(). 'skip' > will get you "right to the point" before where things failed. That would be > an interesting small segment of data to test with. > > wordpad can read your file (and then some). Eventually. > > Sincerely, > KeithC. > > -----Original Message----- > From: Stella Pachidi [mailto:stella.pach...@gmail.com] > Sent: Monday, April 19, 2010 2:07 PM > To: r-h...@stat.math.ethz.ch > Subject: [R] Huge data sets and RAM problems > > Dear all, > > This is the first time I am sending mail to the mailing list, so I hope I do > not make a mistake... > > The last months I have been working on my MSc thesis project on performing > data mining techniques on user logs of a software-as-a-service application. > The main problem I am experiencing is how to process the huge amount of > data. More specifically: > > I am using R 2.10.1 in a laptop with Windows 7 - 32bit system, 2GB RAM and > CPU Intel Core Duo 2GHz. > > The user logs data come from a query Crystal report (.rpt file) which I > transform with some Java code into a tab separated file. > > Although with a small subset of my data everything manages to run, when I > increase the data set I get several problems: > > The first problem is with the use of read.delim(). When I try to read a big > amount of data (over 2.400.000 rows and 18 attributes at each > row) it doesn't seem to transform all table into a data frame. In > particular, the data frame returned has 1.220.987 rows. > > Furthermore, as one of the data attributes is DataTime, when I try to split > this column into two columns (one with Data and one with the Time), the > returned result is quite strange, as the two new columns appear to have more > rows than the data frame: > > applicLog.dat <- read.delim("file.txt") > #Process the syscreated column (Date time --> Date + time) copyDate <- > applicLog.dat[["ï..syscreated"]] copyDate <- as.character(copyDate) > splitDate <- strsplit(copyDate, " ") splitDate <- unlist(splitDate) > splitDateIndex <- c(1:length(splitDate)) sysCreatedDate <- > splitDate[splitDateIndex %% 2 == 1] sysCreatedTime <- > splitDate[splitDateIndex %% 2 == 0] sysCreatedDate <- > strptime(sysCreatedDate, format="%Y-%m-%d") op <- options(digits.secs = 3) > sysCreatedTime <- strptime(sysCreatedTime, format ="%H:%M:%OS") > applicLog.dat[["ï..syscreated"]] <- NULL applicLog.dat <- cbind > (sysCreatedDate,sysCreatedTime,applicLog.dat) > > Then I get the error: Error in data.frame(..., check.names = FALSE) : > arguments imply differing number of rows: 1221063, 1221062, 1220987 > > > Finally, another problem I have is when I perform association mining on the > data set using the package arules: I turn the data frame into transactions > table and then run the apriori algorithm. When I put too low support in > order to manage to find the rules I need, the vector of rules becomes too > big and I get problems with the memory such as: > Error: cannot allocate vector of size 923.1 Mb In addition: Warning > messages: > 1: In items(x) : Reached total allocation of 153Mb: see help(memory.size) > > Could you please help me with how I could allocate more RAM? Or, do you > think there is a way to process the data by loading them into a document > instead of loading all into RAM? Do you know how I could manage to read all > my data set? > > I would really appreciate your help. > > Kind regards, > Stella Pachidi > > PS: Do you know any text editor that can read huge .txt files? > > > > > > -- > Stella Pachidi > Master in Business Informatics student > Utrecht University > > > > -- Stella Pachidi Master in Business Informatics student Utrecht University email: s.pach...@students.uu.nl tel: +31644478898 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.