With those kinds of numbers, I would think a database would be appropriate (instead of spreadsheets).
You can begin to assess performance of R with 90,000 observations with experiments like this: mydat <- list() for (i in 1:30) mydat[[i]] <- sample(letters, size=90000, replace=TRUE) mydat2 <- as.data.frame(mydat, stringsAsFactors=FALSE) dim(mydat2)[1] 90000 30 lapply(mydat2, table) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 3/7/14 7:46 AM, "Marco Barbàra" <jab...@gmail.com> wrote: >Dear UseRs, > >I am going to be involved in the analysis of a cohort of about 90,000 >people. I still didn't have the data at hand, but I know that right now >they are archived into spreadsheet files. So far I only analysed data >sets of very small size. I probably will be able to work on a >relatively fast pc, an i7 with 8 or (i hope) 16 GB RAM. I don't know >the number of variables but I think I shouldn't have the need to use >other than "standard" R (i.e. holding the entire data frame in RAM) >evev if I probably will have to use some non-parametric tools which >should be a bit more computer-intensive. > >Still, since I have no previous experience, it'd be of great help if >someone could give me some advice on which ways could be most >convenient to work in, both from the point of you of databases and of >data access, or otherwise if there is simply no reason for me to bother >at all. > >I'm not asking for prepackaged solutions, rather for help in >documentation seeking and links to useful documentation or other >threads (for example: is it worthwhile using parallel computing?) > >Thank you to anyone for reading this email. >Marco Barbàra. > >P.S.: I work on a Debian system, but this shouldn't matter. > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.