On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote: > > Reason I ask, is I've written some R code which allocates two long > > lists, and then calls a C function with .Call. My C code writes to > > those two pre-allocated lists,
> That's bad! All arguments are essentially read-only so you should > never write into them! I don't see how. (So, what am I missing?) The R docs themselves state that the main point of using .Call rather than .C is that .Call does not do any extra copying and gives one direct access to the R objects. (This is indeed very useful, e.g. to reorder a large matrix in seconds rather than hours.) I could allocate the two lists in my C code, but so far it was more convenient to so in R. What possible difference in behavior can there be between the two approaches? > R has pass-by-value(!) semantics, so semantically you code has > nothing to do with the result.1 and result.2 variables since only > their *values* are guaranteed to be passed (possibly a copy). Clearly C code called from .Call must be allowed to construct R objects, as that's how much of R itself is implemented, and further down, it's what you recommend I should do instead. But why does it follow that C code must never modify an object initially allocated by R code? Are you saying there is some special magic difference in the state of an object allocated by R's C code vs. one allocated by R code? If so, what is it? What is the potential problem here, that the garbage collector will suddenly run while my C code is in the middle of writing to an R list? Yes, if the gc is going to move the object elsewhere, that would be very bad. But it looks to me like that cannot happen, because lots of the R implementation itself would fail badly if it did. E.g.: The PROTECT call is used to increment reference counts, but I see no guarantees that it is atomic with the operations that allocate objects. I see no mutexes or other barriers in C code to prevent the gc from running, thus implying that it *can't* run until the C function completes. And R is single threaded, of course. But what about signal handlers, could they ever invoke R's gc? Also, I was initially surprised not to find any matrix C APIs, but grepping for examples (sorry, I don't remember exactly which functions) showed me that the apparently accepted way to do matrix operations from C is to simply assume R's column-first dense matrix order, and access the 2D matrix as a flat 1D vector. (Which is easy.) > The fact that internally R attempts to avoid copying for performance > reasons is the only reason why your code may have appeared to work, > but it's invalid! I will probably change my code to allocate a new list from the C code and return that, as you recommend. My main reason for doing the allocation in R was just that it was simpler, especially given the very limited documentation of R's C API. But, I didn't see anything in the "Writing R Extensions" doc saying that what my code is doing is "invalid", and more importantly, I don't see why it would or should be invalid... I'd still like to better understand why you think doing the initial allocation of an object in R rather than C code is such a problem. So far, I don't see any way that the R interpreter could ever tell the difference. Wait, or is the only objection here that I'm using C in a way that makes pass-by-reference semantics visible to my R code? Which will work completely correctly, but is not the The Proper R Way? I don't actually need pass-by-reference behavior here at all, but I can imagine cases where I might want it, so I'd like to understand your objections better. Is using C to implement pass-by-reference actually Broken, or merely Ugly? From my reasons above, I think it will always work correctly and thus is not Broken. But of course given R's devotion to pass-by-value, it could be considered unacceptably Ugly. -- Andrew Piskorski <a...@piskorski.com> http://www.piskorski.com/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel