Hi, I tried take 1, and it failed. I have been traveling (and with Martin's changes also waiting for things to stabilize) before trying take 2, probably later this week and I will send an email if it goes in. Anyone wanting to try it and run R through check and check-all is welcome to do so and report success or failure.
best wishes Robert Martin Maechler wrote: >>>>>> "Marcus" == Marcus G Daniels <[EMAIL PROTECTED]> >>>>>> on Tue, 12 Dec 2006 09:05:15 -0700 writes: > > Marcus> Vladimir Dergachev wrote: > >> Here is the second iteration of data frame subset patch. > >> It now passes make check on both 2.4.0 and 2.5.0 (svn as > >> of a few days ago). Same speedup as before. > >> > Marcus> Hi, > > Marcus> I was wondering if this patch would make it into the > Marcus> next release. I don't see it in SVN, but it's hard > Marcus> to be sure because the mailing list apparently > Marcus> strips attachments. If it isn't in, or going to be > Marcus> in, is this patch available somewhere else? > > I was wondering too. > http://www.r-project.org/mail.html > explains what kind of attachments are allowed on R-devel. > > I'm particularly interested, since during the last several days > I've made (somewhat experimental) changes to R-devel, > which makes some dealings with large data frames that have > "trivial rownames" (those represented as 1:nrow(.)) > much more efficient. > > Notably, as.matrix() of such data frames now no longer produces > huge row names, and e.g. dim(.) of such data frames has become > lightning fast [compared to what it was]. > > Some measurements: > > N <- 1e6 > set.seed(1) > ## we round (for later dump().. reasons) > x <- round(rnorm(N),2) > y <- round(rnorm(N),2) > mOrig <- cbind(x = x, y = y) > df <- data.frame(x = x, y = y) > mNew <- as.matrix(df) > (sizes <- sapply(list(mOrig=mOrig, df=df, mNew=mNew), object.size)) > ## R-2.4.0 (64-bit): > ## mOrig df mNew > ## 16000520 16000776 72000560 > > ## R-2.4.1 beta (32-bit): > ## mOrig df mNew > ## 16000296 16000448 52000320 > > ## R-pre-2.5.0 (32-bit): > ## mOrig df mNew > ## 16000296 16000448 16000296 > > ##------------------------------------ > > N <- 1e6 > df <- data.frame(x = 0+ 1:N, y = 1+ 1:N) > system.time(for(i in 1:1000) d <- dim(df)) > > ## R-2.4.1 beta (32-bit) [deb1]: > ## [1] 1.920 3.748 7.810 0.000 0.000 > > ## R-pre-2.5.0 (32-bit) [deb1]: > ## user system elapsed > ## 0.012 0.000 0.011 > > > --- --- --- --- --- --- --- --- --- --- > > However, currently > > df[2,] ## still internally produces the character(1e6) row names! > > something I think we should eliminate as well, > i.e., at least make sure that only seq_len(1e6) is internally > produced and not the character vector. > > Note however that some of these changes are backward > incompatible. I do hope that the changes gaining efficiency > for such large data frames are worth some adaption of > current/old R source code.. > > Feedback on this topic is very welcome! > > Martin > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel