On Mon, Nov 17, 2008 at 1:11 PM, laurent <[EMAIL PROTECTED]> wrote: > Yes, I have briefly seen that you are doing it at the rpy2.robjects > level. That's certainly a good way to go when wanting to offer such > capability. > > Regarding SexpVector accepting numpy's one-dimensional arrays, that's > because anything that provides the sequence protocol is accepted by the > constructor.
Right. > A potential drawback of the approach is that there is copying of the > data from Python/numpy to the embedded R process (and that can be a > significant penalty on performances when working with large(r) data > sets), and I hoped I could avoid that. > > I am also aware that there is already copying when creating R vectors > from array.array instances anyway, and the benefit of having something > working at the rpy2.robjects level right now is looking greater than me > fussing about an hypothetical perfect solution. It is also always > possible to copy any voluminous numpy object to R once, then delete that > numpy object and recreate one from the R object (making both point to > the same underlying data). Ah, I see -- I agree that it would be fantastic to be able to pass data to R without copying, but my impression was that R's internals had much more simple-minded data structures than numpy, and that made this impossible. In particular, looking at Rinternals.h, VECTOR_ELT etc. are defined in terms of calculating direct memory offsets from an SEXPREC. Not only are there no strides (necessary for column-major R to grok row-major Numpy arrays), but you can't even have a SEXPREC that points to an array stored elsewhere in memory -- if you want a vector to be visible in R space, you have to stick a header struct right on the front of it. What might be most useful in practice would be noticing when we are getting an ndarray that *is* backed by an R vector, and then just passing on that R vector instead of re-copying it. Right now, users have to keep track of two different objects -- the SexpVector (or wrapper) and the ndarray returned by numpy.asarray(), and make sure to use the correct one in each case -- if you pass the ndarray to an R function, the data will be copied into R space, even though it's *already* in R space! It would be nice if one could just pass the R-backed ndarray directly to R functions, and they automatically received the corresponding R object. The downside is that there would be mysteriously inconsistent behavior when ndarrays were passed to R functions that mutated their arguments -- R-backed ndarrays would be passed by reference and mutated, normal ndarrays would be passed by value and not mutated. There's also the implementation difficulty that one needs some way for the conversion machinery to recognize which ndarrays already have an associated SEXP, and to find the right SEXP. It's probably possible to do this in some complicated way using weakrefs and a lookaside table etc., but much easier would be to just define a custom ndarray subclass, that stashes that info in an instance variable. And it would be more obvious which objects would be passed by reference vs. by value, because they would be different types. In my perfect world, actually, RArray (or SexpVector!) would *be* that subclass, since that would simplify programs -- no need for calling explicit conversion functions (or confusion about which conversion function to use -- numpy.asarray wouldn't know about the special subclass, so we'd need our own version, etc.), and the full numpy API would be directly available, which means that e.g. multidimensional indexing would Just Work (right now it gives obscure error messages). Also, the numpy API just feels more natural to me -- __add__ mapping to c() is neat, but I can't imagine when I would actually use that behavior in practice, while I can imagine lots of times I would want to do addition on my numeric arrays :-). -- Nathaniel ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list