Hi,
Also I agree those cases are relatively rare in STATISTICAL analysis, you can encounter them for simulation topics (natural catalysm a 5 meter in the topographics can change all the simulations) Two ideas (in addition to loading several sections) is 1- to search for duplicate cases and estimate your model upon a frequency weighted shema, perhaps you don't have 200millions different 'cases' 2- take into account your data and model the used algorythm precision/accuracy, (i.e. No need to take into account 1million case, a precision close to .001, if the gradient, or any other function used, has a .01 accuracy) ... Best regards Naji Le 14/02/05 18:41, « Berton Gunter » <[EMAIL PROTECTED]> a écrit : > >>> read all 200 million rows a pipe dream no matter what >> platform I'm using? >> >> In principle R can handle this with enough memory. However, >> 200 million >> rows and three columns is 4.8Gb of storage, and R usually needs a few >> times the size of the data for working space. >> >> You would likely be better off not reading the whole data set >> at once, but >> loading sections of it from Oracle as needed. >> >> >> -thomas >> > > Thomas's comment raises a question: > > Can comeone give me an example (perhaps in a private response, since I'm off > topic here) where one actually needs all cases in a large data set ("large" > being > 1e6, say) to do a STATISTICAL analysis? By "statistical" I exclude, > say searching for some particular characteristic like an adverse event in a > medical or customer repair database, etc. Maybe a definition of > "statistical" is: anything that cannot be routinely done in a single pass > database query. > > The reason I ask this is that it seems to me that with millions of cases, > (careful, perhaps stratified or in some other not completely at random way) > sampling should always suffice to reduce a dataset to manageable size > sufficient for the data analysis needs at hand. But my ignorance and naivete > probably show here. > > Thanks. > > -- Bert > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html