Dear UseRs, I am going to be involved in the analysis of a cohort of about 90,000 people. I still didn't have the data at hand, but I know that right now they are archived into spreadsheet files. So far I only analysed data sets of very small size. I probably will be able to work on a relatively fast pc, an i7 with 8 or (i hope) 16 GB RAM. I don't know the number of variables but I think I shouldn't have the need to use other than "standard" R (i.e. holding the entire data frame in RAM) evev if I probably will have to use some non-parametric tools which should be a bit more computer-intensive.
Still, since I have no previous experience, it'd be of great help if someone could give me some advice on which ways could be most convenient to work in, both from the point of you of databases and of data access, or otherwise if there is simply no reason for me to bother at all. I'm not asking for prepackaged solutions, rather for help in documentation seeking and links to useful documentation or other threads (for example: is it worthwhile using parallel computing?) Thank you to anyone for reading this email. Marco Barbàra. P.S.: I work on a Debian system, but this shouldn't matter. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.