Thank you, Luke, for your feedback, (more inline below)
>>>>> "Luke" == Luke Tierney <[EMAIL PROTECTED]> >>>>> on Tue, 20 Mar 2007 20:27:18 -0500 (CDT) writes: Luke> On Tue, 20 Mar 2007, Martin Maechler wrote: >> As some of you may have seen / heard in the past, >> it is not possible to make cbind() and rbind() into proper S4 >> generic functions, since their first formal argument is '...'. >> [ BTW: S3-methods for these of course only dispatch on the first >> argument which is also not really satisfactory in the context >> of many possible matrix classes.] >> >> For this reason, after quite some discussion on R-core (and >> maybe a bit on R-devel) about the options, since R-2.2.0 we have >> had S4 generic functions cbind2() and rbind2() (and default methods) >> in R's "methods" which are a version of cbind() and >> rbind() respectively for two arguments (x,y) >> {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and >> hence don't make sense to be used to construct column-names or >> row-names for rbind(), respectively.} >> >> We have been defining methods for cbind2() and rbind2() >> for the 'Matrix' classes in late summer 2005 as well. So far so >> good. >> >> In addition, [see also help(cbind2) ], >> we have defined cbind() and rbind() functions which recursively call >> cbind2() and rbind2(), more or less following John Chambers >> proposal of dealing with such "(...)" argument functions. >> These new recursively defined cbind() / rbind() functions >> however have typically remained invisible in the methods package >> [you can see them via methods:::cbind or methods:::rbind ] >> and have been ``activated'' --- replacing base::cbind / rbind --- >> only via an explicit or implicit call to >> methods:::bind_activation(TRUE) >> >> One reason I didn't dare to make them the default was that I >> noticed they didn't behave identically to cbind() / rbind() in >> all cases, though IIRC the rare difference was only in the dimnames >> returned; further, being entirely written in R, and recursive, >> they were slower than the mostly C-based fast cbind() / rbind() >> functions. >> >> As some Bioconductor developers have recently found, >> these versions of cbind() and rbind() that have been >> automagically activated by loading the Matrix package >> can have a detrimental effect in some extreme cases, >> e.g. when using >> do.call(cbind, list_of_length_1000) >> because of the recursion and the many many calls to the S4 >> generic, each time searching for method dispatch ... >> For the bioconductor applications and potentially for others using cbind() / >> rbind() extensively, this can lead to unacceptable performance >> loss just because loading 'Matrix' currently calls >> methods:::bind_activation(TRUE) Luke> The recursion part is potentially problematic because stack space Luke> limitations will cause this to fail for "relatively" short Luke> list_of_length_1000, but that should be easily curable by rewriting Luke> methods:::cbind and methods:::rbind to use iteration rather than Luke> recursion. Yes, thank you, Luke, something I should have at least tried doing but didn't get to. Luke> This might also help a little with efficiency by avoiding Luke> call overhead. It would be interesting to know how much of the Luke> performance hit is dispatch overhead and how much closure call Luke> overhead. If it's dispatch overhead then it may be worth figuring out Luke> some way of handling this with internal dispatch at the C level (at Luke> the cost of maintaining the C level stuff). Luke> My initial reaction to scanning the methods:::cbind code is that it is Luke> doing too much, at least too much R-level work, but I haven't thought Luke> it through carefully. I can well understand that reaction.. The code started up quite small and easily understandable ... until I started trying to emulate the "standard" cbind() / rbind() behavior, notably about automatic colnames / rownames creation. When delving into that the code got the current partial messyness.... >> For this reason, we plan to refrain from doing this activation >> on loading of Matrix, but propose to >> >> 1) define and export >> cBind <- methods:::cbind >> rBind <- methods:::cbind >> >> also do this for R-2.5.0 so that other useRs / packages >> can start cBind() / rBind() in their code when they want to >> have something that can become properly object-oriented Luke> In mackage methods? yes, inside methods, such that in fact there cBind <- cbind (plus namespace export) would be sufficient. In the mean time I'm less sure if this is desirable; but at least this would ``expose'' the code and have it used and consequently hopefully made more efficient (and hopefully even simplified). >> Possibly --- and this is the big RFC (request for comments) --- >> >> 2) __ for 'Matrix' only __ also >> define and export >> cbind <- methods:::cbind >> rbind <- methods:::cbind >> >> I currently see the possibilities of doing >> either '1)' >> or '1) and 2)' >> or less likely '2) alone' >> and like to get your feedback on this. >> "1)" alone would have the considerable drawback for current >> Matrix useRs that their code / scripts which has been using >> cbind() and rbind() for "Matrix" (and "matrix" and "numeric") >> objects no longer works, but needs to be changed to use >> rBind() and cBind() *instead* >> >> As soon as "2)" is done (in conjunction with "1)" or not), >> those who need a very fast but non-OO version of cbind() / rbind() >> need to call base::cbind() or base::rbind() explicitly. >> This however would not be necessary for packages with a NAMESPACE >> since these import 'base' automagically and hence would use >> base::cbind() automagically {unless they also import(Matrix)}. >> >> We are quite interested in your feedback! Luke> Either one seems cleaner to me than having loading of one package Luke> result in mucking about in the internals of another. I agree {{the reason I had chosen the unclean approach was the hope that methods:::cbind could be improved to soon replace base::cbind -- and so "Matrix" would just do something that would happen more universally in the future anyway}} Luke> If we are thinking of these as long term solutions then I think having Luke> different names is cleaner, so 1) but not 2). If we are thinking of Luke> this as a transition towards making base::cbind and base::rbind Luke> support S4 dispatch via cbind2/rbind2 (assuming this can be done Luke> efficiently) then there may be some merit to 2) to minimize the need Luke> for code rewriting. Yes, exactly. My originally intent was strongly in the direction of making base::cbind support S4 via cbind2() and rbind2() -- and one will continue to be able to experiment with that after calling methods:::bind_activation(TRUE). The problem I had underestimated is Luke> assuming this can be done efficiently Luke> It might be worth experimenting with having .Internal(cbind(...)) Luke> check its arguments and call methods:::cbind if (Methods is loaded Luke> and) any of the arguments are S4 -- as the S4 property is now cheap to Luke> determine that may be very low cost especially if done after the Luke> object bits have been checked with positive result. Very nice idea ... currently I don't have the time to do it, so, unless another R-core member (or an avid R-devel reader) jumps in, I don't see this possible for R 2.5.0 (and hence the 2.5.x series). Too bad that nobody else (on R-devel) seems interested enough. In the mean time, I'm proposing to implement '2)' which will give >> > library(Matrix) >> Loading required package: lattice >> >> Attaching package: 'Matrix' >> >> >> The following object(s) are masked from package:base : >> >> cbind, >> rbind every time Matrix is loaded, but that seems appropriate to me. The question remains if it wasn't worth to do '1)' as well {not in Matrix, but the 'methods' package} such that people can more easily get experience with such a cbind2/rbind2 - based version of cbind/rbind. Martin Luke> Best, Luke> luke >> Martin Maechler and Doug Bates <[EMAIL PROTECTED]> Luke> -- Luke> Luke Tierney Luke> Chair, Statistics and Actuarial Science Luke> Ralph E. Wareham Professor of Mathematical Sciences Luke> University of Iowa Phone: 319-335-3386 Luke> Department of Statistics and Fax: 319-335-3017 Luke> Actuarial Science Luke> 241 Schaeffer Hall email: [EMAIL PROTECTED] Luke> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel